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Australia: show the 
world what climate 
action looks like 


Scott Morrison’s government must act on 
overwhelming evidence and public opinion. 


ast November, as bush fires began to roar across 
large swathes of Australia, people started to ask: 
could such an extreme event be connected to 
climate change? 

Prime Minister Scott Morrison dodged the 
question. Gladys Berejiklian, the premier of the state of 
New South Wales, where the fires have had the biggest 
impact, said that during the unfolding disaster was not 
the time to talk about climate change. Two months on, 
this season’s devastating conflagrations have killed at 
least 28 people and an estimated one billion native ani- 
mals; burnt about 10 million hectares of vegetation; and 
destroyed more than 2,000 homes. 

The top priority is to protect lives and ecosystems. But 
the nation’s leaders must surely realize that they not only 
need to talk about climate change, but also need to act 
decisively to reduce the emissions that are driving it. 

Australia’s leaders have known for many years that 
climate change would make bush fires worse. They were 
warned in an independent report commissioned by the 
national and state governments in 2008 that from 2020 
onwards, fire seasons would start earlier, end later and 
be more intense. 

But as Nature has frequently reported, the country’s 
politicians delayed meaningful action through a wasted 
decade of arguments over whether human activities are 
causing climate change — in the face of overwhelming 
scientific evidence that they are. Undoubtedly, one reason 
for this is that Australia — which is the world’s largest coal 
exporter — has repeatedly prioritized the coal industry’s 
needs over the planet’s. 


Not enough 

The government now says it is on track to reduce green- 
house-gas emissions by 26-28% of 2005 levels by 2030, 
to meet its commitment under the 2015 Paris climate 
agreement. Its plan includes a policy to pay farmers and 
businesses to restore or protect native vegetation, anda 
programme to encourage energy efficiency. 

But commitments on sucha scale — whether from Aus- 
tralia or other countries — are insufficient to limit warm- 
ing to below 2 °C above pre-industrial levels, the goal of 
the agreement. And a significant portion of Australia’s 
planned cuts is to be achieved through accounting tricks, 
rather than actual emissions reductions. The government 
plans for around half — 367 million tonnes of greenhouse 


Australian Prime Minister Scott Morrison visiting a fire-hit area in Victoria. 
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gases — tocome from ‘credits’ it accumulated by surpassing 
its targets under the previous climate agreement, the 1997 
Kyoto Protocol. That means its actual cuts will be 15% from 
2005 levels. No other high-income country that has signed 
the Paris agreement has said it will transfer its Kyoto credits 
in this way — and nor should Australia. 

Last week, after international outrage over his lack of 
leadership, Morrison switched gears. He started talking 
about how, as a result of the catastrophic fires, the govern- 
ment would focus on actions that build resilience and adap- 
tation to extreme events, such as bush fires, heatwaves 
and droughts. 

For Australia, that’s a significant move — but it is not 
enough. The government has to do much more to cut its 
emissions, too. Just reacting to the impacts of climate 
change without addressing the cause is like treating peo- 
ple for lung cancer while continuing to let them smoke. 

Australia’s tragedy is that more-extreme fires are already 
forecast. Centuries of greenhouse-gas emissions have 
locked the world into several decades of warming, even 
if global emissions were to drop to zero now. If the Morri- 
son government continues its current trajectory, then the 
country is likely to experience even more severe droughts 
and fires. 

The Morrison government has to make a choice: does it 
want Australians to live with fires that are becoming worse 
than those in the past but which can still be managed to 
some extent? Or does it want to put citizens at risk of future 
fire conditions that are even more catastrophic than this 
season’s? There can be only one answer to this question 
if the government accepts that its first role is always to 
protect its citizens and its country. 

We frequently hear the argument that actions from indi- 
vidual countries such as Australia will, on their own, make 
little difference to global warming. But that is why we have 
global agreements. Change will come when everyone acts 
in concert. Australia, along with the United States, China, 
the European Union and others all have to play their part, 
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leading the way on decarbonizing energy for households, 
industry, transport and more. 

Instead of arguing with its climate researchers, Austral- 
ia’s government needs to work with them to accelerate this 
transition, and to ensure that, as far as possible, lives and 
livelihoods are protected when change arrives. A country 
onthe front lines of climate change has no other choice. 


Stop the Wuhan 
coronavirus 


Vigilance, preparedness, speed, transparency 
and global coordination are now crucial to 
preventing a new infectious disease from 
becoming a global emergency. 


s hundreds of millions of people in China take 

to the roads, railway and skies to be with their 

families for the new year holidays, authorities 

in the country and around the world have 

mounted an enormous operation to track and 
screen travellers from Wuhan in central China. 

This follows the outbreak of a mysterious pneumonia-like 
coronavirus, first reported on the last day of December 
2019, that has so far claimed six lives in China. The World 
Health Organization is deciding whether to declare the 
situation an international public-health emergency. 

The virus has been spreading. On 21 January, as Nature 
went to press, there were almost 300 reported cases — 
seven times the figure stated five days earlier. Over the 
past week, authorities in South Korea, Thailand and Japan 
have also reported cases. Researchers at Imperial College 
London who have modelled the outbreak on the basis of 
estimates of travel out of Wuhan say the virus might have 
infected as many as 1,700 people. 

The virus, which still lacks a formal name, is being called 
2019-nCOV. It is a relative of both the deadly severe acute 
respiratory syndrome (SARS) and the Middle East respira- 
tory syndrome (MERS) viruses. People with the virus report 
a fever along with other symptoms of lower-respiratory 
infection suchas a cough or breathing difficulties. The first 
people infected in China are understood to have caught the 
virus in one of Wuhan’s live animal and seafood markets 
— probably from an animal. Some 95% of the total cases, 
including those in Japan, South Korea and Thailand, also 
involved people who had been to Wuhan. 

The virus has not been found in humans before and 
knowledge of how it is spread is still evolving. Last week, 
government officials and researchers in China who are 
tracking the virus told Nature they didn’t think it spreads 
readily from human to human, at least not as fast as SARS. 
But this view is being revised following the intervention of 
SARS specialist Zhong Nanshan. After a visit to Wuhan on 
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20 January, Zhong, who directs the State Key Laboratory of 
Respiratory Disease in Guangzhou, confirmed that 14 med- 
ical workers had been infected by one virus carrier, raising 
concern that some people might be ‘super-spreaders’ of 
the virus. Stopping the further spread of the disease out of 
Wuhan, possibly by banning infected people from leaving 
Wuhan, has to be atop priority, he said. 

China’s health authorities and the government have been 
moving quickly. Also on 20 January, the national broad- 
caster reported that president Xi Jinping had ordered 
that the virus be “resolutely contained”, and Premier Li 
Keqiang announced a steering group to tackle disease 
spread. At the beginning of the month, local authorities 
in Wuhan closed and disinfected the animal market, and 
health authorities have reported the results of their disease 
surveillance efforts. 

Researchers, too, have had a crucial role, in publishing 
and sharing genome sequences. Four different research 
groups sequenced the genomes of six virus samples — and 
analyses of all six agree that the virus is a relative of SARS. 
Researchers are to be commended for making sequence 
data available, and they should continue to do so. (Release 
of such data, as well as deposition of manuscripts on pre- 
print servers, will not affect the consideration of papers 
submitted to Nature.) 

As China’s government has recognized, the authorities 
fumbled in their response to SARS, which spread globally, 
killing more than 770 people in 2002-03. Fifteen per cent 
of those infected died, arate that seems much higher than 
that of the current outbreak — at least from what is known 
so far. In contrast to SARS, the response this time has been 
faster, more assured and more transparent. 

But there is still much to do, and quickly. The virus’s 
original source must be confirmed — something that is 
proving difficult. Researchers have found virus traces 
in swabs taken from the animal market. The authorities, 
rightly, made closing and sterilizing the market their first 
priority, but in their rush to do so they might have missed 
achance to test the animals. In the case of SARS, we now 
know that bats transmitted the virus to other animals, 
which then passed it to humans. Other questions include 
confirming the method of transmission for new cases, as 
well as understanding the virus’s ability to cause serious 
illness. Virus genomes from infected people will need to be 
sequenced continually to understand the extent to which 
the virus is evolving. 

China’s health authorities did well to act more quickly 
than in the past. Now, they must continue to report what 
they knowand what more they are uncovering. The emerg- 
ing situation requires global co-ordination and leadership 
from the World Health Organization, with the support of 
public-health agencies worldwide. Researchers must work 
fast, collaboratively and transparently to address the key 
research questions. The world has had plenty of practice 
with SARS and avian flu — we should know what to do. 

Around 7 million people are preparing to fly from China 
to 400 cities in100 countries to celebrate the Chinese New 
Year. Now is the time to stop this outbreak spiralling into 
a global health emergency. 
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AUDREY TRAN 


A personal take on science and society 


World view 


By Vinay Prasad 


Our best weapons against 
cancer are not magic bullets 


Better health and social policy would save 
more lives than sophisticated drugs. 


arlier this month, the American Cancer Society 

announced its latest figures on cancer incidence 

and mortality (R. L. Seigel. et al. CA Cancer. Clin. 

70, 7-30; 2020). These included the largest drop 

ever observed in national cancer statistics, which 
several media outlets seized on. Cancer death rates in the 
United States peaked in1990, and in 2008-17 fell by about 
1.5% per year. Between 2016 and 2017, the drop was slightly 
larger: 2.2%. This is undeniably good news. 

But our optimism must be tempered by other measures 
of population health — particularly declining life 
expectancy. 

The reason behind the large drop is a decrease in mor- 
tality for lung cancer — without lung cancer, the rate is still 
about 1.5%. Several reactions to the Cancer Society’s news 
heralded advances in precision treatments. Yet much of the 
continued reduction in mortality is due to the lower inci- 
dence of lung cancer, or a reduction in new cases per year. 
And new drugs cannot cause that. The two major therapeu- 
tic advances for treating this cancer — genome-targeted 
therapies and immunotherapy — are currently approved 
for the worst-off individuals: those with advanced or meta- 
static disease. 

Exciting technologies that uncover genetic drivers of 
cancer and unleash the immune system against it make 
headlines, but I think we must be careful not to give 
customized treatments too much credit, and I have been 
outspoken about my work to pin down the impact of these 
therapies. We would do better to focus on public-health 
strategies that are less glamorous. 

My colleagues and I have estimated that, as of 2018, 
8.33% of the US population with advanced cancer was 
eligible for genome-targeted therapy, up from 5.09% in 
2006 (J. Marquart et al. JAMA Oncol. 4, 1093-1098; 2018). 
Another work found that people whose lung cancers are 
eligible for genome-targeted treatments and who receive 
them live, overall, about 30 weeks longer than those who 
are eligible and are not treated (G. Singal et al. J. Am. Med. 
Assoc. 321, 1391-1399; 2019). That benefit is real, but is 
unlikely to have altered mortality rates markedly across 
a population. 

Similarly, immunotherapy — which expanded into the 
market in 2015 — might have had only limited effects on 
the drop in overall cancer mortality. The benefits for 
melanoma and for advanced and metastatic lung cancer 
are impressive, but so far affect relatively few people. 

Much bigger drops in US cancer mortality would 
come from a fairer society. The American Cancer 
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Society estimates that, in 2014, 59% of lung-cancer deaths 
observed in people aged 25-74 could have been averted by 
eliminating socio-economic disparities (R. L. Siegel et al. 
CA CancerJ. Clin. 68, 329-339; 2018). 

What’s more, US life expectancy has fallen for three 
straight years. The cause is largely diseases of despair: 
drug overdose, suicide and alcohol-related liver disease. 
And these kinds of risk factor cluster. People who die from 
using opiates are more likely to smoke, for instance. The 
American Cancer Society uses age-standardized popula- 
tions to address concerns that a rise in untimely deaths 
could mask what would have been future cancer deaths 
and thus spuriously improve cancer death statistics, but 
it is hard to know exactly how factors behind declining life 
expectancy play into cancer mortality. 

The data do make it clear that the majority of our most 
effective solutions will be found outside the cabinet of 
cutting-edge medicines. If we want to do all that we can to 
reduce the burden of cancer and to improve life expectancy, 
we must harness the tools of population statistics. 

That means we need to create strategies to treat hyper- 
tension, end the use of tobacco products, dismantle 
policies that promote obesity and use of environmental 
carcinogens, encourage physical activity and reduce levels 
of carcinogens in the environment. In my cancer clinic, I 
often wishI had more effective drugs for the person in front 
of me.I, too, want sophisticated treatments that work. But 
what I really wish is that the person I’m treating did not 
have cancer at all. 

Our public policy is a series of self-inflicted wounds. 
The current US administration has allowed loopholes that 
let the known carcinogen asbestos remain in use. It has 
failed to improve standards for airborne particulate pol- 
lution, clearly linked to higher rates of diseases and death. 
It reversed a decision to ban a pesticide, chlorpyrifos, 
associated with impaired childhood brain development, 
and atrazine, linked to leukaemia. 

My deep frustration is this: it is hard to escape the 
conclusion that we, as a society, are not doing what it takes 
to maximize our health. We are prioritizing medications 
that cost US$100,000 a year or more, and at the same time 
are loosening restrictions on environmental pollution. 
These policies have one thing in common: they enhance 
corporate profits. It will take a realignment of public policy 
to make sure that we pursue systems that instead prioritize 
health. 

Public-health policies are not personalized to any 
individual, but can promote longevity for all of us, even 
if it will not make for feel-good stories about scientific 
breakthroughs or miraculous drugs. In this exciting age 
of precision medicine, we will reap the biggest gains by 
celebrating better health for everyone. 
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The world this week 


Newsin brief 


CHINA NEARS TOP SPOT 


FOR RESEARCH SPENDING 


The gap in research funding 
between the United States 

and China is closing fast, 
despite modest increases in US 
funding since 2000, according 
to statistics assembled by 

the US National Science 
Foundation (NSF). 

From 2000 to 2017, research 
and development (R&D) 
spending in the United States 
grew at an average of 4.3% 
per year, the NSF found. But 
spending in China grew by 
more than 17% per year during 
the same period. Several other 
countries, including Germany 
and South Korea, also increased 
their spending at rates that 
outstripped that of the United 
States, although they remain 
solidly behind the two global 
leaders in terms of total funding. 
The United States accounted for 
25% of the US$2.2 trillion spent 
on R&D worldwide in 2017, with 
China making up 23%. 

The figures come from the 
latest edition of the NSF’s 
biennial Science and Engineering 
Indicators report, which 
compiles metrics on the state of 
science and engineering inthe 
country. The United States is 
increasingly “seen globally as an 


important leader rather than the 
uncontested leader” in science 
and engineering, according 

to the report, released on 
15January. 

Preliminary data from 2019 
suggest that China has already 
surpassed the United States in 
R&D spending, said Julia Philips, 
chair of the National Science 
Board’s science and engineering 
policy committee, during 
apress briefing. The board 
oversees the NSF and produces 
the /ndicators reports. 

The emergence of innovation 
powerhouses outside the 
United States “can only be 
good”, says Diane Souvaine, 
acomputer scientist at 
Tufts University in Medford, 
Massachusetts, who chairs the 
National Science Board. She 
notes that the United States 
still leads the world in many 
important metrics, such as total 
investment in R&D, proportion 
of highly cited publications and 
enrolment of internationally 
mobile students. 

However, the NSF report 
found that the number of 
foreign-born students enrolling 
in US universities has declined 
slightly in recent years. 


SCIENCE SPENDING 


China is catching up to the United States on funding for research and development. 
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NEW VIRUS 
SURGINGIN ASIA 
CAUSES ALARM 


Scientists are increasingly 
concerned about a new virus 
that is spreading in Asia. As 
Nature went to press, Chinese 
officials had reported 291 cases 
nationwide, most in the city 

of Wuhan, where the outbreak 
began. Thailand, Japan and 
South Korea are among the 
nations that have reported 
infections. At least six people 
have died from the virus, which 
causes a respiratory illness. 

Chinese officials have also 
confirmed that the virus can 
spread from person to person, 
although the extent of such 
transmissibility is unclear. The 
surge in infections is alarming 
because of Chinese New Year 
this weekend, when hundreds of 
millions of people will travel to 
their home towns or overseas. 
“This could be the beginning 
of a disaster,” says Seungtak 
Kim, a virologist at the Pasteur 
Institute Korea in Seongnam, 
South Korea. 

The illness was first detected 
last December among people 
who had visited a live-animal 
market in Wuhan. Scientists 
have identified the pathogen as 
acoronavirus, from the same 
family that causes severe acute 
respiratory syndrome, or SARS. 
As Nature went to press, the 
World Health Organization 
was Set to meet on 22 January 
to decide whether to declarea 
public-health emergency over 
the virus. 
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High risk 

of major 
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Researchers in the Philippines are monitoring the 
Taal volcano closely for signs of a major eruption. The 
volcano’s activity has eased since it began spewing 
steam and ash more than a week ago, but the threat 
of a large-scale eruption remains, say scientists. In 
addition to the immediate risk to life, such an event 
could contaminate water supplies and disrupt power 
generation for millions, and halt ground and air travel. 

At 2.30 p.m. local time on 12January, Taal started 
ejecting lava and blew out a giant plume of rock 
fragments. Ash travelled as far north as Quezon City, 
some 70 kilometres away, forcing tens of thousands 
of people on Taal’s Volcano Island and in nearby 
provinces to flee. 

The volcano’s activity has stalled, but this does 
not mean the worst is over, says Mariton Bornas, a 
volcanologist at the Philippine Institute of Volcanology 
and Seismology just north of Manila. 

The volcano remains at level 4 on the country’s 
volcano-alert system, the second-highest level, meaning 
a hazardous eruption could happenin hours or days. 


CATASTROPHIC 
AUSTRALIAN BUSH 
FIRES DERAIL 
RESEARCH 


The blazes raging across 
Australia have damaged lives, 
homes and businesses. They 
have also destroyed scientific 
equipment and derailed 
research. 

Remote-sensing specialist 
Will Woodgate at the University 
of Queensland in Brisbane 
manages a site in the Bago State 
Forest that gathers data on 
land surface conditions to feed 
into global climate models. As 
fire tore through the site on 
New Year's Eve, the data that 
have flowed from it for 20 years 
stopped. Photos suggest that 
the layer of vegetation under the 
forest canopy has been wiped 
out, although the canopy itself is 
intact. Woodgate says sensors at 
the top of a tower at the centre 
of the site could have survived. 

Elsewhere, the Australian 
Mountain Research Facility was 
set up last year by the Australian 
National University in Canberra 
to study how a changing climate 
affects alpine landscapes. It had 
planned to deploy sensors and 
monitoring equipment to its 
eight field sites in the Australian 
summer. But fire at one site 
has left “nothing but bare soil’, 
says soil scientist Zach Brown, 
the senior technical officer 
for the project. Installation of 
equipment across the network 
has been set back by a year, 
he says. 


P > ? 


OZONE-EATING GASES 
LINKED TO EXTREME 
ARCTIC WARMING 


Gases that deplete Earth’s 
protective ozone layer could 

be responsible for up to half of 
the effects of climate change 
observed in the Arctic from 1955 
to 2005. 

The finding, published on 
20 January, could help to explain 
the disproportionate toll that 
climate change has taken on the 
region, an effect that has long 
puzzled scientists (L. M. Polvani 
etal. Nature Clim. Change http:// 
doi.org/djt5; 2020). The Arctic 
is warming at more than twice 
the average rate of the rest of the 
globe — a phenomenon known 
as Arctic amplification — and it 
is losing sea ice at a staggering 
pace. 

Ozone-depleting substances, 
including chlorofluorocarbons 
(CFCs), are known to heat the 
atmosphere more efficiently 
than carbon dioxide. But most 
research on these chemicals has 
focused on their effects on the 
ozone layer. 

Ateam of researchers 
compared climate simulations 
both with and without the mass 
emission of CFCs that began in 
the 1950s. Without CFCs, the 
simulations showed an average 
Arctic warming of 0.82 °C, but 
with CFCs, the number jumped 
to 1.59 °C. 

Replicating these results in 
multiple climate models will be 
crucial for improving estimates 
of how much responsibility 
CFCs bear for heating the Arctic, 
say researchers. 
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The world this week 


News in focus 


The summit of Mauna Kea in Hawaii already hosts 13 telescopes. 


HOW THE FIGHT OVERA 
HAWAII MEGA-TELESCOPE 
COULD CHANGE ASTRONOMY 


Thirty Meter Telescope controversy is forcing scientists to 
grapple with how their research affects Indigenous peoples. 


By Alexandra Witze 
on Mauna Kea, Hawaii 


ne morning earlier this month, onthe 

rain-soaked slopes of Mauna Kea in 

Hawaii, Noe Noe Wong-Wilson was 

settled in for the long haul. Wrapped 

inatrench coat to keep out the wind 

and cold, the educator and activist held a 

meeting amid camp beds and folding chairs 
inside a giant tarpaulin-sheltered tent. 

Wong-Wilson is a leader of the Mauna Kea 

kia’i, a group of Native Hawaiians who have 

been encamped near the volcano’s base since 


last July. They are preventing construction 
workers from building an enormous telescope 
near the summit, on land the kia’i regard as 
sacred. The planned Thirty Meter Telescope 
(TMT) would transform astronomy by peer- 
ing into the Universe with sharper vision than 
that of nearly any other. But there are already 
13 telescopes atop Mauna Kea, and the kia’i 
say that adding the TMT would be too much. 

If project officials cannot work out a way 
to build the telescope in Hawaii, they intend 
to move it to an alternative — but slightly less 
scientifically compelling — site in Spain’s 
Canary Islands. Whatever the outcome, 
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the debate over the TMT is profoundly 
transforming how astronomy is done in 
Hawaii. The island chain — one of the world’s 
best places for stargazing — has become a 
testing ground for the ethics of conducting 
research in a place full of injustice towards 
Indigenous peoples. 

“Gone are the days of the scientific conceit 
of being separate from the community,” says 
Jessica Dempsey, deputy director of the East 
Asian Observatory, which operates a telescope 
on Mauna Kea. “Astronomers really have to do 
more contemplation about where they are in 
the world, and about the social context and 
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impact of their work.” 

How the Mauna Kea stand-off plays out 
could affect astronomical research in other 
locations and other fields of science around 
the world, she says. 

Astronomers confronted this new reality 
this month, when thousands of them attended 
a meeting of the American Astronomical 
Society in Honolulu. The conference featured 
many sessions on Hawaiian culture and astron- 
omy and saw anti- and pro-TMT demonstra- 
tions. “It’s an industry that is congruent with 
our culture as explorers,” said Malia Martin, a 
Native Hawaiian who supports the TMT, as she 
waved a Hawaiian flag outside the convention 
centre. 


Changing course 


The fight over the TMT has become a symbol 
of historical inequities in Hawaii, notably the 
seizure of lands from Native Hawaiians before 
and after the United States annexed the islands 
in 1898. “This is a political issue rooted in 
historical injustice,” says Greg Chun, executive 
director of Mauna Kea stewardship for the 
University of Hawaii, which manages the 
mountaintop land on which the observatories 
sit. Homes and vehicles across the islands 
often fly the Hawaiian flag upside down as a 
symbol of protest against the US government. 

TMT officials have tried to address some of 
these long-standing issues, in part by estab- 
lishing educational and workforce-training 
programmes for local residents. But the pro- 
ject, which is expected to cost its partners 
in the United States, India, China, Japan and 
Canada more than US$1.4 billion, has not been 
able to proceed with construction. Both times 
it tried — first in 2015, and then again in July 
2019 — the kia’i blocked the road to Mauna 
Kea’s summit. 

The 13 existing telescopes atop the 
mountain face an uncertain future. The Uni- 
versity of Hawaii has committed to removing 
five as a condition of the permit to build the 
TMT. The three chosen so far are among the 
oldest telescopes on Mauna Kea. 

The future of the rest — which include 
some of the world’s most scientifically pro- 
ductive observatories, such as the Keck 
and Canada-France-Hawaii telescopes — is 
assured only until 2033. Astronomy will end 
on Mauna Kea after that if the state govern- 
ment does not renew the university’s master 
lease on the mountaintop, which governs all 
the telescopes’ operations. 

From her spot at the base of the mountain, 
Wong-Wilson says she is open to the possi- 
bility of the lease being renewed. “There is 
space for discussion about improving the way 
astronomy remains upon our mountain,” she 
says. “But attitudes have to change. Astron- 
omers look at us like we’re the bad guys, like 
we’re intruding on their space. It’s quite the 
opposite: they’re in our space.” 
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Cutting-edge astronomy should continue 
within the footprint of the existing observato- 
ries, says Rosie Alegado, an oceanographer at 
the University of Hawaii at Manoa. She helps 
lead a group of Native Hawaiian scientists who 
this month called for an immediate halt to the 
TMT project while organizers seek “informed 
consent” for the telescope to move forward 


“Gone are the days of 
the scientific conceit of 
being separate from the 
community.” 


(S. Kahanamokuetal. Preprint at https://arxiv. 
org/abs/2001.00970; 2020). They also called 
for Indigenous people to have more overall 
input into decisions involving the mountain. 
“I feel like astronomy on Mauna Kea could 
represent an example of when science got off 
course, but we course-corrected and came 
back stronger than ever,” she says. 


Momentous decision 


How that might happen remains to be seen. 
If the TMT moves to the Canary Islands, it 
will take with it money it would otherwise 
spend to help maintain the infrastructure for 


astronomy on Mauna Kea, such as the road 
to the summit. The move could also shift the 
focus of TMT partners, a few of whom operate 
some of the existing telescopes, away from 
Hawaii. 

State and local governments have brokered 
a detente between TMT officials and the kia’i 
until the end of February. Representatives of 
various groups are meeting to try to hammer 
out some sort of agreement for whether and 
how the TMT might proceed on Mauna Kea. 

But the clock is ticking. The telescope needs 
funding from the US National Science Foun- 
dation to keep moving forward. To get it, the 
project would need to be ranked highly in 
the next ‘decadal’ survey of priorities for US 
astronomy, which scientists are compiling. 
Results are expected in early 2021. The TMT 
might not get a high ranking if it can’t showa 
clear pathto construction — which means that 
the issues with Mauna Kea need to be sorted 
out, or it needs to move to the Canaries. 

For Dempsey, the debate has pushed 
long-simmering disagreements over science 
and land rights to the fore. “I’m kind of glad 
in some ways that we've been forced into this 
conversation,” she says. “We didn’t do enough 
creative things in our local community in 
Hawaii until we were forced to — by people 
saying that this is not okay.” 


SUPERCOMPUTER SCOURS 
FOSSIL RECORD FOR 
HIDDEN EXTINCTIONS 


Palaeontologists have charted 300 million years 
of Earth’s history in breathtaking detail. 


By Ewen Callaway 


alaeontologists have a fuzzy view of 
Earth’s history. An incomplete fossil 
record and imprecise dating tech- 
niques make it hard to pinpoint events 
that happened within geological eras 
spanning millions of years. Now, a period that 
saw a boomin animal complexity and one of 
Earth’s greatest mass extinctions is coming 
into sharp focus. 

Using the world’s fourth most powerful 
supercomputer, Tianhe Il, ateam of scientists 
based mostly in China mined a fossil data- 
base of more than 11,000 species that lived 
during the period from around 540 million to 
250 million years ago. The result is a history of 
life during this period, the early Palaeozoic era, 
that can pinpoint the rise and fall of species 
during diversifications and mass extinctions 
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to within about 26,000 years (J.-x. Fan et al. 
Science 367, 272-277; 2020). 

“Itis kind of amazing,” says Peter Wagner, a 
palaeontologist and evolutionary biologist at 
the University of Nebraska-Lincoln, who was 
not involved in the work. Being able to look 
at species diversity on this scale is like going 
froma system where “people who lived in the 
same century are considered to be contempo- 
raries, to one in which only people who lived 
during the same 6-month period are deemed 
to be contemporaries”, he wrote in an essay 
accompanying the study (P. Wagner Science 
367, 249; 2020). 

Such a view, Wagner adds, will help scien- 
tists to identify the causes of mass extinctions 
— such as the event at the end of the Permian 
period, some 252 million years ago, that wiped 
out more than 95% of marine species — as well 
as understand less dramatic species die-offs 


SHUTTERSTOCK 


Trilobites disappeared from the fossil record during a mass extinction 252 million years ago. 


— 


and rebounds that have been hard to uncover 
because of gaps in the fossil record. An under- 
standing of these processes could reveal paral- 
lels to the planet’s current loss of biodiversity. 


Patchy record 


Most organisms in Earth’s history didn’t leave 
fossils, and scientists have identified only a 
tiny fraction of those that did. Asa result, it 
can be hard to tell whether changes in the 
fossil record mark real shifts, such as mass 
extinctions, or are simply caused by a lack of 
fossil finds. 

In the 1960s, palaeontologists began 
analysing the fossil record systematically, 
revealing multiple mass extinctions and 
periods during which life flourished. But 
these and later efforts could usually pinpoint 
biodiversity changes to within only about ten 
million years, because fossils were lumped into 
relatively long geological periods and analysed 
en masse. 

To improve on this, a team led by palaeon- 
tologist Jun-xuan Fan at Nanjing University in 
China created and analysed a database of fossil 
marine invertebrates that were foundin more 
than 3,000 layers of rock, mostly from China 
but representing geology around the planet 
during the early Palaeozoic. The group then 
used software to measure when individual 
species had emerged and gone extinct. 

The program took advantage of the fact 
that species were usually found in multiple 
rock formations — each spanning hundreds 
of thousands to millions of years — and used 
this information to place upper and lower lim- 
its on the period in which the species actually 
existed. The effort revealed for how long, and 
in what order, all 11,000 species had existed. It 
took the supercomputer around seven million 
processor hours. 


Using this approach, the team was able to 
learn extra details about events such as the 
end-Permian extinction, and the Cambrian 
explosion around 540 million years ago. The 
analysis showed, for instance, that species 
diversity declined in the 80,000 years leading 
up to the end-Permian mass extinction, which 
itself occurred over about 60,000 years. 


The findings also cast doubt on the exist- 
ence of a smaller-scale die-off known as the 
end-Guadalupian extinction, whichis thought 
to have wiped out many marine species around 
260 million years ago. That was the biggest 
surprise, says Mike Benton, a palaeontologist 
at the University of Bristol, UK, who has docu- 
mented changes in vertebrate diversity during 
that period. The study, he adds, “represents a 
pretty amazing big-data endeavour’. 

Benton hopes to see the effort extended to 
later periods — particularly the past 100 million 
years. Palaeontologists disagree over whether 
an apparent increase in animal diversity in this 
period is the result of sampling bias. 

Norman MacLeod, a palaeontologist at 
the University of Nanjing and a co-author of 
the study, says the team’s work might help to 
reveal the underlying causes of changes in 
biodiversity, by charting ups and downs ona 
timescale that can be matched with environ- 
mental and climatic shifts. 

Wagner adds that the team’s approach will 
be most valuable in uncovering — and explain- 
ing — smaller-scale extinctions, not dissimilar 
to those occurring today. Such extinctions 
could turn out to be “a bad 100,000 years, or 
abad week” for some groups of organisms but 
not others, he says. “When you get this reso- 
lution, it starts opening the doors to actually 
testing what the smaller-turnover events 
might be like.” 


STUDIES OF EMBRYO-LIKE 
STRUCTURES STRUGGLE 
TO WIN US GRANTS 


Biologists say they need clearer guidelines 
on funding rules for this nascent field. 


By Nidhi Subbaraman 


cientists can now create clumps of cells 
that resemble human embryos, raising 
hopes that they could study the elusive 
first stages of human development 
while avoiding the ethical concerns 
that make it difficult to study actual human 
embryos. But as these embryo models — in 
which humanstem cells are transformed into 
embryo-like structures whose growth mirrors 
stages of embryonic development — grow in 
popularity, US researchers say that they are 
finding it increasingly difficult to obtain fed- 
eral funding for such work. 

The US National Institutes of Health (NIH) 
in Bethesda, Maryland, has funded and still 
does fund work on embryo-like structures. 
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Aspokesperson told Nature that the agency 
considers grant applications involving mod- 
els that “could be considered an organism” on 
a “case-by-case basis”, and cited a provision 
of federal law known as the Dickey-Wicker 
Amendment, which bars the government 
from funding research that creates or destroys 
human embryos. 

But the ban, which dates back to 1996, 
was put in place before the advent of tech- 
niques that produce embryo-like structures 
from stem cells. Scientists working on such 
research say that they need clearer guidance 
on what is eligible for federal funding. “The 
writing on the wall is that synthetic embryos 
are out of bounds with the NIH. The next step 
inthe science is not allowed,” says Eric Siggia, a 
physicist who studies developmental systems 
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News in focus 


at the Rockefeller University in New York City. 

Amid this growing criticism, the agency’s 
Office of Science Policy asked the US National 
Academies of Sciences, Engineering, and Med- 
icine (NASEM) to host a day-long workshop to 
lay out the latest developments in experiments 
with embryo-like structures. At the NIH’s 
request, the meeting on 17 January in Wash- 
ington DC did not include any presentations 
onethics or regulations. 

The NASEM meeting was intended to help 
people to “better understand some of the 
unknowns associated with this nascent field”, 
Carrie Wolinetz, the NIH’s acting chief of staff 
and associate director for science policy, wrote 
ina blogpost last year. “Can research involving 
various models of aspects of human embryo 
development be supported by NIH? The 
answer is ‘it depends’” she added. 


Sticky wicket 


Embryo research in the United States has long 
been fraught. In addition to the Dickey-Wicker 
Amendment, US scientists are guided by an 
internationally acknowledged ethical guide- 
line called the 14-day rule. This limits embryo 
research to the two-week period after ferti- 
lization. And last June, the US government 
halted fetal-tissue research by government 
scientists and began requiring that any grant 
application involving such material undergo 
an extra ethics review. 

None of these laws and guidelines specifi- 
cally deals with the increasingly complex col- 
lections of cells that mimic the early stages of 
humanembryonic development, and can shed 
light on processes that are otherwise difficult 
to study. Crucially, embryo-like structures are 
not formed from an egg and sperm, as real 
embryos are. Scientists say that it is unclear 
whether or how existing guidelines are being 
applied to research that uses the structures. 

Siggia anda colleague at Rockefeller, devel- 
opmental biologist Ali Brivanlou, submitted 
a progress report to the NIH in 2018 on their 
grant to study the mechanisms by which 
colonies of embryonic stem cells organize 
themselves. Siggia says that they were told 
by NIH staff to cut plans for research in which 
synthetic embryonic cells would interact 
with “extra-embryonic” cells — tissue that 
grows into the placenta and other structures 
that nourish an embryo. “The mix of extra- 
embryonic and embryonic cells could get what 
someone would construe to be an embryo — 
and they didn’t want to go anywhere near that,” 
Siggia says. But he argues that the work would 
bethe next logical step in experimental design. 

He and Brivanlou resubmitted their plans 
for the next year after altering the original text. 
“Then it moved forward,” he says. 

The Rockefeller group is not the only one 
adapting its plans so that it can continue its 
work. Aryeh Warmflash, a stem-cell biologist 
at Rice University in Houston, Texas, says he 
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isn’t applying for federal funding for work 
that uses embryo-like structures to study 
the phase of development known as gastru- 
lation. “It doesn’t seem to me to be worth the 
effort,” Warmflash says. He is turning to private 
funders. 

And FuJianping, a bioengineer at the Univer- 
sity of Michigan in Ann Arbor, says that he sub- 
mitted a grant application to the NIH to study 
the origin of cells that are precursors to eggs 
and sperm using embryo-like structures. The 
agency reviewed and scored it lastJune, anda 
programme officer e-mailed Fu a list of ques- 
tions, including one that asked whether his 
experiments would involve extra-embryonic 
tissue. Several months later, Fu says he hasn't 
received any funding. “The uncertainty from 
the funding agencies is definitely going to be 
aroadblock to continued progress,” he says. 


An NIH spokesperson told Nature that 
scientists with questions about any grant 
application or award could contact the rele- 
vant agency official, and that the agency does 
not comment on unfunded grant applications. 

The International Society for Stem Cell 
Research in Skokie, Illinois, said on 16 Janu- 
ary that it would release updated guidelines 
in early 2021 to address the complexity of 
research with embryo-like structures. It also 
released a series of recommendations for 
researchers to follow until then. 

“The NIH of course is struggling with the 
question when is an embryo not an embryo,” 
says Janet Rossant, a developmental biologist 
at the Hospital for Sick Children in Toronto, 
Canada, and an organizer of the NASEM work- 
shop. “I would also absolutely say we’re not 
close toa line that should not be crossed.” 


HUGE SURVEY REVEALS 


PRESSURES OF 


SCIENTISTS’ LIVES 


Global study highlights long hours, poor job 
security and mental-health struggles. 


By Alison Abbott 


survey of more than 4,000 scientists 

has painted a damning picture of the 

culture in which they work, suggest- 

ing that highly competitive and often 

hostile environments are damaging 
the quality of research. 


Around 80% of the survey’s participants — 
mostly academic researchers in the United 
Kingdom — believed that competition had fos- 
tered mean or aggressive working conditions, 
and half described struggles with depression 
or anxiety. Nearly two-thirds of respondents 
reported witnessing bullying or harassment 
and 43% said they had experienced it. 


COST OF THE CULTURE 


In a global survey of around 4,000 researchers, 55% said that they had a negative impression of 
scientific working cultures. One-quarter said that the culture damaged the quality of research. 


How would you describe research culture? 


Positive 
Neutral 
Negative 


What effect does the culture have on research quality, individuals and society? 


Quality of 
research 


Individuals 


Society 


20 
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“These results paint a shocking portrait of 
the research environment — and one we must 
all help change,” says Jeremy Farrar, director of 
Wellcome, a major research funder inLondon 
that conducted the study with market-research 
agency Shift Learning. “A poor research culture 
ultimately leads to poor research.” 

Farrar says that Wellcome — which supports 
some 15,000 people working in science world- 
wide — is committed to addressing the issues 
highlighted by the survey, and he calls on 
the entire research system to get on board. 
“The pressures of working in research must 
be recognized and acted upon by all, from 
funders to leaders of research and to heads 
of universities and institutions,” he says. 


Unsustainable environment 


Wellcome conducted the survey, published 
on 15 January, as part of a broader drive to 
improve working environments in science. 
It says the push for excellence has created a 
troubling culture. “It’s more than clear that our 
current research practice is not sustainable,” 
says Beth Thompson, who leads Wellcome’s 
research-culture initiatives. “We knew things 
were not right, from our own discussions with 
scientists, from high-profile bullying cases, 
reports of misconduct and irreproducibility.” 

The results come from an online survey 
open to all researchers, which was answered 
by around 4,300 people across career stages 
and disciplines. Respondents hailed from 
87 countries; three-quarters were in the 
United Kingdom. Workshops with 36 UK-based 
researchers and in-depth interviews with 
94 also informed the findings. 

Most researchers reported having pride in 
their institutions and passion for their work, 
but spoke of the high personal toll of their 
environment (see ‘Cost of the culture’). Many 
accepted that pressure and long hours came 
with the territory — two-thirds of respondents 
said they worked for more than 40 hours a 
week. But researchers said that the situation 
was worsening and that the negative aspects 
were no longer offset by job security and the 
ability to work autonomously, flexibly and 
creatively. Barely 30% of respondents felt 
that there was job security in research careers. 

Many blamed funders and institutes that 
emphasize performance indicators and met- 
rics such as number of publications and the 
impact factors of journals in which researchers 
publish. They said that the importance of these 
metrics is often stressed in ways that reduce 
morale and encourage researchers to game 
the system. Some said that good management 
could shelter scientists from such distorting 
pressures, but that it was too seldom applied. 

One-quarter of respondents thought 
that the quality of research suffered in the 
unsupportive environments. The same 
proportion had felt pressured by their super- 
visors to produce a particular result. 


Quantum entanglement is at the centre of a new mathematical proof. 


THE ‘SPOOKINESS OF 
UANTUM PHYSICS 
OULD BE INCALCULABLE 


Proof at the nexus of pure mathematics and 
algorithms puts ‘quantum weirdness’ ona new level. 


By Davide Castelvecchi 


Ibert Einstein famously said that 

quantum mechanics should allow 

two objects to affect each other’s 

behaviour instantly across vast 

distances, something he dubbed 
“spooky action at a distance”’. Decades after 
his death, experiments confirmed this. But, 
to this day, it remains unclear exactly how 
much coordination nature allows between 
distant objects. Now, five researchers say 
that they have solved a theoretical problem 
that shows that the answer is, in principle, 
unknowable. 

The team’s proof”, presented in a 165-page 
paper, was posted onthe arXiv preprint repos- 
itory on 14 January, and has yet to be peer 
reviewed. If it holds up, it will solve in one 
fell swoop a number of related problems in 
pure mathematics, quantum mechanics and 
a branch of computer science known as com- 
plexity theory. In particular, it will answer a 
mathematical question that has gone unsolved 
for more than 40 years. 

Iftheir proof checks out, “it’s asuper-beau- 
tiful result” says Stephanie Wehner, atheoret- 
ical quantum physicist at Delft University of 
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Technology in the Netherlands. 

At the heart of the paper is proof of a 
theorem in complexity theory, which is 
concerned with efficiency of algorithms. 
Earlier studies had shown this problem to be 
mathematically equivalent to the question of 
spooky action at a distance — also known as 
quantum entanglement’. 


Quantum game theory 


The theorem concerns a game-theory 
problem, with a team of two players who 
are able to coordinate their actions through 
quantum entanglement, eventhough they are 
not allowed to talk to each other. This allows 
both players to ‘win’ much more often than 
they would without quantum entanglement. 
But it is intrinsically impossible for the two 
players to calculate an optimal strategy, 
the authors show. This implies that it is 
impossible to calculate how much coordina- 
tion they could theoretically achieve. “There 
isno algorithm that is going to tell you what is 
the maximal violation you can get in quantum 
mechanics,” says co-author Thomas Vidick 
at the California Institute of Technology in 
Pasadena. 

“What’s amazing is that quantum 
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News in focus 


complexity theory has been the key to the 
proof,” says Toby Cubitt, a quantum-infor- 
mation theorist at University College London. 

News of the paper spread quickly through 
social media after the work was posted, spark- 
ing excitement. “I thought it would turn out to 
be one of those complexity-theory questions 
that might take 100 years to answer,” tweeted 
Joseph Fitzsimons, chief executive of Horizon 
Quantum Computing, a start-up company in 
Singapore. 

“I’m shitting bricks here,” commented 
another physicist, Mateus Aratjo at the Aus- 
trian Academy of Sciences in Vienna. “I never 
thought I'd see this problem being solved in 
my lifetime.” 


Observable properties 


On the pure-maths side, the problem was 
known as the Connes’ embedding problem, 
after the French mathematician and Fields 
medalist Alain Connes. It is a question in the 
theory of operators, a branch of maths that 
itself arose from efforts to provide the foun- 
dations of quantum mechanics in the 1930s. 
Operators are matrices of numbers that can 
have either a finite or an infinite number of 
rows and columns. They have a crucial role 
in quantum theory, whereby each opera- 
tor encodes an observable property of a 
physical object. 


In a 1976 paper’, using the language of 
operators, Connes asked whether quantum 
systems with infinitely many measurable 
variables could be approximated by simpler 
systems that have a finite number. 

But the paper by Vidick and his collaborators 
shows that the answer is no — there are, in 
principle, quantum systems that cannot be 
approximated by ‘finite’ ones. According 


“I thought it would turn out 
to be one of those questions 
that might take 100 yearsto 
answer.’ 


to work by physicist Boris Tsirelson°, who 
reformulated the problem, this also means 
that itis impossible to calculate the amount of 
correlation that two such systems can display 
across space when entangled. 


Disparate fields 


The proof has come as a surprise to much of 
the community. “I was sure that Tsirelson’s 
problem had a positive answer,” commented 
Aratjo on one blog, adding that the result 
shook his basic conviction that “nature is in 
some vague sense fundamentally finite”. 

But researchers have barely begun to grasp 


the implications of the results. Quantum 
entanglement is at the heart of the nascent 
fields of quantum computing and quantum 
communications, and could be used as the 
basis of super-secure networks. In particular, 
measuring the amount of correlation between 
entangled objects ina communication system 
can provide proof that it is safe from eaves- 
dropping. But the results probably do not 
have technological implications, Wehner says, 
because all applications use quantum systems 
that are finite. In fact, it could be difficult to 
even conceive an experiment that could test 
quantum weirdness on an intrinsically infinite 
system, she says. 

The confluence of complexity theory, quan- 
tum information and mathematics means that 
there are very few researchers who say that 
they are able to grasp all the facets of this 
paper. Connes himself told Nature that he 
was not qualified to comment. But he added 
that he was surprised by how many ramifica- 
tions it has turned out to have. “It is amazing 
that the problem went so deep and I never 
foresaw that!” 
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Feature 


The pollution 
detectives 


Someone, somewhere, is producing banned ozone-destroying chemicals. 
Meet the researchers tracking down the rogue polluters who are putting 
the planet at risk. By Jane Palmer 


igh in the Swiss Alps, scientists in 
a small research station are busy 
fingerprinting the atmosphere. 
Perched on a mountain ridge 
at around 3,450 metres altitude, 
the Jungfraujoch centre boasts 
five laboratories, a workshop, a 
library, a tiny kitchen and ten small 
bedrooms. Day and night, funnels suck in the 
thin mountain air and channel it into a series 
of instruments designed to separate, identify 
and measure the chemicals swirling through 
this pristine locale. “We are scanning the whole 
spectrum of thousands and thousands of 
molecules,’ says atmospheric chemist Martin 
Vollmer. “It is like we are taking the DNA of the 
atmosphere.” 

Vollmer, who works at the Swiss Federal 
Laboratories for Materials Science and Tech- 
nology (EMPA) in Diibendorf, specializes in 
sniffing out newly emerging trace gases, which 
make up less than 1% by volume of the planet’s 
atmosphere. Some of the most notorious are 
the chlorofluorocarbon (CFC) coolants used 
for refrigeration and foam production. These 
destroy the ozone layer, the shield that pro- 
tects life on Earth from damaging ultraviolet 
light. In 1987, after researchers demonstrated 
the threat posed by CFCs, nations banded 
together to adopt an international agreement 
known as the Montreal Protocol, to control 
and eventually phase out CFCs. Updates to the 
treaty have outlawed some of their replace- 
ments, which also turned out to damage the 
ozone layer, climate or both. 

Behind the scenes, scientists such as 
Vollmer are keeping watch over the health 
of the atmosphere — in part to make sure 
nations are honouring their promises. “This 
is detective work,” says Stephen Montzka of 
the US National Oceanic and Atmospheric 
Administration (NOAA) in Boulder, Colorado. 
“Our remit is to understand if things are chang- 
ing as expected.” 
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For many years, the news coming from these 
air-monitoring campaigns was good. Concen- 
trations of CFCs and several other dangerous 
compounds were declining steadily. It was the 
biggest win in environmental policy the world 
has ever seen, say researchers. 

Then, in May 2018, Montzka reported a dis- 
turbing blip: levels of one of the most harmful 
chemicals, trichlorofluoromethane, knownas 
CFC-11, weren't dropping as fast as expected, 
suggesting that companies were producing 
this compound somewhere, in violation of 
the protocol. “It was the most surprising and 
shocking thing I’ve seen in my entire career,” 
Montzka says. 

Montzka’s research pointed to eastern Asia, 
and a follow-up study last May pinpointed the 
source ofa significant fraction of the emissions 
to two provinces in China’. The discovery of 
these rogue CFC-11 emissions has highlighted 
just how much the Montreal Protocol relies 
on the vigilance of scientists. But it has also 
raised questions about whether researchers 
can keep up with an ever-growing list of dam- 
aging compounds — some so new that their 
impacts remain unknown. 

For the moment, they hope they are win- 
ning. Last November, nations that are parties 
to the Montreal Protocol gathered in Rome, 
where Montzka presented some positive news 
about the illegal CFC emissions. 


Fresh start 

It all starts with fresh air. Every week, come 
rain, shine or, more typically, snow, Jen Morse 
makes the trek up to a small green shack on 
Colorado’s Niwot Ridge, which lies on the 
Front Range of the southern Rocky Moun- 
tains. Insummer, she can drive part of the way 
and has to hike only the final kilometre of the 
6-kilometre trip; in winter, she has to ski the 
entire distance to the remote, wind-swept spot 
at 3,523 metres altitude, carrying four large gas 
canisters in her backpack. 
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Once in the shack, Morse, who is a climate 
technician at the University of Colorado, Boul- 
der, connects each flask to an inlet and waits 
for them to fill. She then heads back down 
and delivers the snapshots of mountain air to 
NOAA's Global Monitoring Division in Boulder, 
just 40 kilometres away. At the lab, Montzka 
and his colleagues run the flasks’ contents 
through three separate gas chromatographs 
to determine what resides in the ‘background’ 
atmosphere, which doesn’t have any nearby 
contamination and therefore provides a read- 
ing of chemicals circling the entire globe. “We 
have to pick special locations far away from 
local sources of pollution to do that,” Montzka 
says. “These are desolate areas that are hard 
and expensive and difficult to be at.” 

Flasks are shipped to the lab from 16 sites 
around the world, including the South Pole, 
the top of Greenland’s ice cap and the tip of 
Tasmania in Australia. 

The NOAA team runs samples through its 
instruments to determine the levels of SO trace 
gases in the atmosphere. The Jungfraujoch 
lab is part of a second, NASA-sponsored 
network called the Advanced Global Atmos- 
pheric Gases Experiment (AGAGE), which has 
13 active stations in a dozen nations. 

Some of these sites have been monitoring 
CFCs and related compounds since the 1970s. 
When these compounds were invented in the 
1920s, chemists regarded them as safe. But by 
the 1970s, researchers recognized that CFCs 
could drift up to the stratosphere and erode the 
protective ozone layer. This realization — along 
with the shocking discovery in1985 of aholein 
the ozone layer over Antarctica — led nations 
to adopt the Montreal Protocol. 

NOAA and AGAGE researchers meet reg- 
ularly to discuss their findings, which they 
summarize in reports for the parties to the 
Montreal Protocol. These reports document 
the decline in the concentrations of CFCs in 
the atmosphere and they have identified other 
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The Jungfraujoch research station in 
Switzerland is part of a global network 
that monitors the atmosphere. 


ozone-damaging chemicals. As such, scientists 
have continued to provide input into the pro- 
tocol, which nations have updated to limit the 
production of other harmful gases. “It wasn’t a 
one-stop scientific treaty,” says David Fahey, an 
atmospheric chemist with NOAA, and one of 
the four co-chairs of the Scientific Assessment 
Panel of the Montreal Protocol. 

The teams monitoring the air are forever 
playing catch up as new compounds appear 
in the skies. Even before CFCs were banned, 
manufacturers developed substitute cool- 
ants such as hydrochlorofluorocarbons 
(HCFCs). But researchers quickly found that 
these compounds also damage the ozone 
layer, and a 2007 amendment to the protocol 
called for the complete ban of production and 
consumption of HCFCs by 2030. Next came a 
third generation of coolant, the hydrofluoro- 
carbons, or HFCs. These don’t contain chlo- 
rine or bromine, and so they don’t damage the 
ozone layer. But they turned out to be powerful 
greenhouse gases; most have a warming power 
between 1,400 and 5,000 times greater than 
that of carbon dioxide. 

Consequently, in 2016, delegates agreed on 
the Kigali Amendment tothe Montreal Protocol, 
which calls for cutting the production and use of 
HFCs by 80-85% by the late 2040s. The amend- 
mententered into force at the start of 2019 with 
the goal of avoiding warming by up to 0.5 °C. 

Monitoring stations such as Jungfraujoch 
track progress towards those goals in differ- 
ent parts of the world; sometimes they find 
problems. Scientists at the station found that 
northern Italy had emitted between 26 and 
56 tonnes of HFC-23 per year in 2008-10, yet 
the official Italian inventory had estimated 
only 2.6 tonnes for the whole country. 


Blindsided 


Until a few years ago, it seemed that the main 
threats to the ozone layer were on their way out 
and scientists could focus on the newer gases. 
Then came the first hints of trouble. 

One day in 2013, Montzka ran the air from 
his weekly delivery of flasks through the mass 
spectrometer he had designed nearly 30 years 
earlier. But when he looked at the output of 
these routine measurements from the previ- 
ous few months, he noticed something odd: 
the levels of CFC-11 were not declining as fast 
as before. 

To Montzka, the observation made no sense 
— production of CFCs had been phased out 
worldwide three years earlier. Before 2012, 
the concentration of CFC-11 had dropped by 
about 0.8% per year, but Montzka’s flask data 
suggested the decline rate had slowed substan- 
tially. “I was totally amazed, I couldn't believe 
it,” Montzka says. “Then I thought to myself 
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Feature 


SECRET STOCKS 


Researchers use data from two air-monitoring networks to calculate emissions of CFC-11, which 
can come from new production or leakage from older products. Emissions declined as expected 
until 2005, but then plateaued and started to rise because of rogue manufacturing. 
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that it was just some blip that will go away next 
year — something weird has happened in the 
atmosphere, or in my instrument.” 

Montzka double-checked his measure- 
ments and then, for the next few years, he and 
the international team searched for possible 
explanations. Eventually, the trail of evidence 
led toa single conclusion: emissions of CFC-11 
were going up rather than down, pointing toa 
violation of the Montreal Protocol (see ‘Secret 
stocks’). “It did take a while to unravel the story 
in a way that I thought would be useful to the 
international community,” Montzka says. 

Between 2002 and 2012, CFC-11 emissions 
averaged 54,000 tonnes per year, owing to 
gradual leakage of old stores of the compound 
contained in foam insulation and appliances 
made before the mid 1990s. But the research- 
ers found that between 2014 and 2016, average 
emissions grew to 67,000 tonnes a year — an 
increase of roughly 25%’. They also noted that, 
in 2013, the flask data at the Mauna Loa Obser- 
vatory in Hawaii suddenly showed increased 
levels of CFC-11 in the pollution plumes reg- 
ularly recorded at the site. On closer investi- 
gation, they found that the sources of those 
plumes, and the uptick in CFC-11 emissions, 
came from eastern Asia. 

A team of scientists immediately began to 
look for clues in an independent set of meas- 
urements, in particular those from the AGAGE 
stations on Jeju Island in South Korea, and 
Hateruma in Japan. Data from these stations 
revealed spikes in CFC-11 whenever plumes of 
pollution passed by. And the spikes had grown 
since 2013. 

With this information, the scientists ran 
computer models using atmospheric circu- 
lation data and the monitoring-station meas- 
urements to determine where the pollution 
was coming from. Four independent mod- 
elling groups worked on solving the puzzle, 
and all came back with the same answer: about 
7,000 tonnes per year were coming from the 
Chinese provinces of Shandong and Hebei’. 

The newly discovered emissions will not sig- 
nificantly delay recovery of the ozone layer, 


466 | Nature | Vol 577 | 23 January 2020 


says Matthew Rigby, an atmospheric chemist 
at the University of Bristol, UK. “But if they 
carry on, we could be seeing delays of years 
or more,” he says. 


Aclose call 


On 4 November 2019, Tina Birmpili, executive 
secretary of the UN Ozone Secretariat, deliv- 
ered her opening speech at the 31st Meeting of 
the Parties to the Montreal Protocol in Rome. 
She began by praising the success of the treaty 
so far and the decisive action taken by China 
to address its emissions of CFC-11, including 
setting up a national monitoring network and 
increased penalties for companies that violate 
production bans. “CFC-11 was an alarm for all 
parties to ensure that they address illegal pro- 
duction swiftly and send a clear message to 
those who would break the law,” Birmpilisays. 
Then Birmpili turned her attention to some 
unanswered questions around the unexpected 
CFC-11emissions. The researchers’ most recent 
published findings estimate that CFC-11 emis- 
sions from China account for 40-60% of the 
global increase between 2014 and 2017, but that 
leaves 4,000-10,000 tonnes unaccounted for’. 
Right now, the researchers aren’t in a 
position to say whether there are other sources 
of illegal emissions or whether uncertainties 
in their models can account for the remain- 
ing percentage of the global trend, Rigby says. 
In the future, they will try to improve their 
models to see if they can glean a more accu- 
rate picture of the CFC-11 changes, he says. 
Montzka thinks that this time the monitoring 
community was lucky: researchers were able 
to detect the global trend change fairly early 
and happened to be making measurements 
near the region where at least some of the 
new emissions were coming from. But if CFC- 
11 had emanated from India, Russia or South 
America, the existing networks wouldn't have 
been able to identify the location of the source 
because no regional stations exist nearby. 
When Montzka stepped up tothe podiumin 
Rome, he presented some fresh observations 
from the global monitoring data. In 2018, the 
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rogue emissions seemed to slow or disappear. 
The decline of the global concentrations of 
CFC-11laccelerated, and the amount of the gas 
in plumes reaching the monitoring stations in 
Hawaii and Jeju Island substantially decreased. 
Although researchers have yet to fully check 
the latest measurements, they take heart 
from the trend. “The evidence suggests that 
the Montreal Protocol is being effective in yet 
another set of circumstances — in this case, 
unprecedented circumstances,” Fahey says. 

If the CFC-11 concentrations continue to 
decline over the next few years, it will mark a 
significant victory for the scientists and their 
monitoring networks. “There’s always the 
discussion of whether it is really important 
that we are still here,” says Stefan Reimann, 
an atmospheric chemist at EMPA. “And, yes, 
history proves that we still have to be here.” 

The rogue-emissions incident highlights 
weaknesses in the current system, which was 
developed to investigate the science of howthe 
atmosphere is changing, not totrack emissions, 
says geochemist Ray Weiss at the University of 
California, San Diego. “We never expected to 
see a Violation, whichis a lesson in itself really.” 

In response to the latest challenge, NOAA 
added a flask-collection site on the west coast 
of South Korea to gather more information 
from eastern Asia. And this year, the parties 
will continue to discuss what is needed to 
ensure a similar violation doesn’t happen 
again, Birmpili says. 

Meanwhile, the scientists are maintaining 
their strategy of watching, waiting and inves- 
tigating. At Jungfraujoch, Vollmer is paying 
close attention to the latest generation of 
coolants: hydrofluoroolefins (HFOs). As those 
break down, some of them, suchas one known 
as HFO-1234yf, can decompose into trifluoro- 
acetic acid, which is toxic to some plants and 
soil organisms. The German and Norwegian 
environment agencies have recommended 
more research on the HFOs. 

Measurements at Jungfraujoch showa rapid 
rise in these compounds. In 2011, HFO-1234yf 
appeared in none of Vollmer’s samples. By 
2018, it was in 71% of them. 

Currently, industry produces only a small 
amount of HFOs because the phase-out of 
HFCs has just begun. “But if you make a back- 
of-the-envelope calculation and you replace all 
the compounds that we’ve been using previ- 
ously by the HFOs, there are going to be huge 
quantities of these gases,” Vollmer says. 

So he makes the journey each month to the 
high, glaciated saddle between two peaks in 
the Alps, where Jungfraujoch’s instruments 
hum away day and night. “We have to keep 
watching,” he says. 


Jane Palmer is a freelance writer based in 
Colorado. 


1. Montzka, S. A. et al. Nature 557, 413-417 (2018). 
2. Rigby, M. et al. Nature 569, 546-550 (2019). 
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Science in culture 


Books & arts 


US Army nurses in 1947. Shifting social norms have driven the swift rise and demise of smoking in many places. 


Peer pressure 


shapes our world 


Social context affects our actions. Policymakers should leverage that to 
cut emissions, boost health and more, a book argues. By Thomas Dietz 


n 1989, just 12% of US adults favoured 

legalization of same-sex marriage; by 

2015, that figure was around 60%. What 

triggered the transformation? In Under 

the Influence, economist Robert Frank 
reveals that peer pressure lies behind many 
such step changes. Once views began to shift, 
the process was self-reinforcing. 

As Frank drives home, we humans are 
especially adept at learning from our peers. 
Our decisions are strongly influenced by 
social norms — what we think others are 
doing, and what we think they think we 
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should do. In some circumstances, we can 
be self-interested; in others, we can be 
altruistic. So it’s not surprising that much 


Under the Influence: 
Putting Peer Pressure 
to Work 

Robert H. Frank 
Princeton Univ. Press 
(2020) 
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of social-science research focuses on social 
context in decision-making. Frank reviews 
extensive evidence from studies across a 
number of disciplines on how peer pressure 
shapes the dynamics of smoking, drinking, 
obesity, consumerism and many other impor- 
tant social issues. 


Pressure point 

Because the tendency to emulate can lead to 
rapid social change, for better or worse, it is 
a key lever for policy. Yet, asserts Frank, that 
message has yet to reach many policy analysts 
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and economists. Under the Influence offers a 
corrective through compelling arguments for 
incorporating social contexts into the design 
of policy on climate change, public health, 
the financing of public goods, social justice, 
taxation and beyond. 

Among the cascades of change Frank 
examines are ‘arms races’, which can focus on 
anything from nuclear weapons to consumer 
goods. They are a type of commons dilemma 
or collective-action problem: the pursuit 
of narrow self-interest leads to overuse of 
a resource, and disaster. (If foresters, for 
instance, limit the number of trees they fell 
every year, the forest can regenerate, to the 
benefit of all; if they each boost their own 
short-term profits by maximizing their fell- 
ing, the forest ecosystem might collapse.) 
But in an arms race, what matters is not your 
absolute measure of resources. It is what you 
have compared with what! have. Thus, every- 
one hasan incentive to accumulate resources 
ina never-ending upward spiral. 


Boom and bust 


Frank points, for example, to the sharp 
increases in US housing prices that led to the 
bubble of the early 2000s. To ensure access 
to the best school districts, buyers competed 
to live in the most affluent neighbourhoods, 
bidding up housing costs inexorably. The 
result was unrealistic prices, unsustainable 
mortgage burdens andaslumpin price that led 
to bankruptcies and the collapse of lenders — 
allofwhich contributed to the 2008 economic 
meltdown. 


Frank examines another problematic arms 
race: the widespread opposition of the rich 
to increased taxation. This, he argues, hinges 
on what he calls the “mother of all cognitive 
illusions”: the belief that happiness is based 
on absolute wealth (and spending power), 
which higher taxes would slash. Frank coun- 
ters that view, asserting that rich people’s 
well-being is based on relative wealth — their 
position compared with that of their peers. A 
tax affecting all top earners would maintain 
relative position, whatever the effect on abso- 
lute spending power. His analysis is timely, 


“Inanarms race, what 
matters is not your absolute 
resources. It is what you have 
compared with what I have.” 


because low and declining US tax rates for the 
top income bracket have led to a loss of gov- 
ernment revenue and, in turn, massive under- 
investment in public goods suchas education 
and infrastructure. Frank suggests a remedy: 
taxing consumption (income minus savings) 
for the wealthiest. 

One of the great strengths of Under the 
Influence is Frank’s use of research from across 
the social sciences, including psychology and 
political science. Yet he fails to engage with 
much that’s salient to his arguments here. For 
instance, regarding policy challenges such as 
climate change and obesity, he admits that 
his “deepest passion’ is efficiency — that is, 


For many people, wealth relative to others is more important than absolute spending power. 
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he favours taxation over regulation. Thus, 
he adopts a standard utilitarian approach to 
decision-making. To demonstrate the success 
of this approach, he cites the US policy that 
placed a price on sulfur dioxide emissions 
from 1995, significantly reducing levels of 
acid rain. But when he discusses the impor- 
tance of in-depth deliberation in resolving 
conflicts, and in changing individual views on 
gay rights and environmental protection, he 
does not mention the extensive literature on 
how deliberative processes can underpin good 
decision-making, atheory complementary to 
his utilitarianism. 


Unexplored factors 


Frank’s analysis would thus benefit from even 
deeper digging into findings on context, social 
structure, power and social interaction, such 
as the critique of growth dynamics in environ- 
mental sociology or the 2017 book Beyond 
Politics, an analysis of private environmen- 
tal governance by Michael Vandenbergh 
and Jonathan Gilligan. For example, Frank’s 
argument about the well-being of the affluent 
resting on relative status does not factor inthe 
possibility that rich people might be seeking 
political power and influence on government 
instead. Among the richest, power might 
depend on absolute wealth. Similarly, his 
thoughtful chapter on climate change does 
not fully address opposition to climate policy 
from powerful fossil-fuel interests. 

Moreover, Frank mentions only in passing 
issues such as the human tendency to asso- 
ciate with those like us (homophily) and to 
affirm what we already believe (confirmation 
bias). In the social networks of government 
officials, lobbyists and others who influence 
policy, these tendencies lead to polarization 
and a lack of action on serious problems. So 
although Frank urges us to consider context, 
he misses the need to pay more attention to 
the structure of contexts, including inequality 
and power. 

Of course, one book, however broad its 
compass, cannot cover everything. And even 
where | felt Frank had not tackled important 
lines of research, those gaps point to the need 
to think more deeply about human actions 
and the policies that shape them. At a time of 
multiple impending crises, Under the Influence 
will provoke your thinking in constructive ways. 


Thomas Dietz is university distinguished 
professor in sociology and of environmental 
science and policy at Michigan State 
University in East Lansing. 

e-mail: tdietzvt@gmail.com 
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Setting the agenda in research 


Comment 


1 ffl 
Christiana Figueres at the 21st United Nations Climate Change Conference. She led the negotiations that produced the 2015 Paris agreement. 


The secret to tackling 
climate change 


Christiana Figueres 


To the world leaders 
mustering in Davos: set your 
minds to reaching net-zero 
emissions, and you can forge 
the future we need. 
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s political leaders, industry executives 
and celebrities gather this week for 
their yearly networking meeting 
in Davos, Switzerland, top of their 
agenda is the need to halve global 
carbon emissions by 2030. 

Of the many barriers to achieving this goal, 
the greatest is mindset. I had to learn this a 
decade ago when | was appointed to lead the 
international climate-change negotiations 
that resulted in the 2015 Paris agreement: 
ultimately, 195 nations pledged to reduce 
emissions and alter their economies to pro- 
tect our planet. They also agreed to increase 
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their efforts towards net-zero emissions 
substantially every five years. That makes 
2020 a crucial year. We cannot afford for 
governments to let that key commitmentslip. 

The Paris agreement was a breakthrough 
after a devastating collapse in Copenhagen 
in 2009, when years of preparation and two 
weeks of excruciating around-the-clock 
negotiations produced only a weak, legally 
irrelevant accord. Copenhagen was a free- 
for-all of political frustration, outrage and 
disagreement — with the global north and 
global south set against each other. Last 
month’s United Nations climate meeting 
in Madrid left many of us similarly bereft. 
That makes the lesson of how we got from 
Copenhagen to Paris all the more relevant. 

It started with my making a big mistake in 
the summer of 2010, at a press conference 
with 40 journalists in a windowless room at 
the Maritim Hotel in Bonn, Germany. When 
asked whether a global agreement on climate 
change would ever be possible, I blurted out, 
without thinking, what most already thought: 
“Not in my lifetime.’ That’s how closeI came to 
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making failure a self-fulfilling prophecy. 

l immediately realized that, before we 
could consider the political, technical and 
legal parameters of an eventual agreement, | 
had to dedicate myself to changing the mood: 
there could be no victory without optimism. | 
decided to set aclear intention: even if we did 
not know precisely how, a global deal would 
emerge, simply because it was necessary. It 
was that contagious frame of mind that led to 
effective decision-making, despite the enor- 
mous complexities under which we were oper- 
ating. When the Paris agreement was achieved, 
the optimism that people felt about the future 
was palpable — but, in fact, optimism had been 
the primary input. 

Since then, science has become clearer about 
the threats of climate change: now, even our 
children know that business as usual will lead 
to destroyed infrastructure, devastating loss 
of plants and animals, and millions of people 
struggling in regions made uninhabitable from 
rising temperatures and lack of fresh water. 

What is much less clear is what life will look 
like in those places where we do what is neces- 
sary to limit warming to1.5 °C, as stipulated by 
the Paris agreement. To get to what we achieved 
in Paris, we moved away from confrontational 
blaming-and-shaming to appreciating shared 
opportunities. Now, we must picture, say, cities 
full of green spaces pulling carbon dioxide 
fromthe atmosphere; widespread public trans- 
port; thriving wildernesses; rural economies 
rebooted for sustainable agriculture; and jobs 
in renewable-energy projects. 

Optimism is about acknowledging difficul- 
ties — and losses — yet still designing a better 
future. An excellent example is the European 
Union’s proposed European Green Deal, 
announced in December 2019. This explic- 
itly reframes an urgent challenge as a unique 
opportunity to create a “resource-efficient 
and competitive economy” that will gener- 
ate jobs, purify air and mobilize industry, 
agriculture and other sectors to deliver 
net-zero emissions by 2050. 

My own country, Costa Rica, has already 
launched an economy-wide plan to ‘decarbon- 
ize’ by 2050. This ambitious plan, the first of its 
kind when it was announced last February, will 
expand forests and promote electric taxis and 
public buses. It is based on respect for human 
rights and gender equity, and clearly recog- 
nizes the opportunity for decarbonization to 
revitalize the economy. 

Most executives already understand that 
they need to contribute to climate stabili- 
zation just to ensure that their businesses 
have a future. The number of companies 


Costa Rica has launched a decarbonization plan that will expand the country’s forest cover. 


setting science-based targets in line witha 
1.5 °C trajectory doubled between September 
and December last year. Similarly, the com- 
bined assets managed by the Net-Zero Asset 
Owner Alliance — a group of investors align- 
ing their portfolios with a 1.5 °C future — had 
surged from US$2.4 trillion to $4 trillion within 
two months of its launch in September 2019. 
Leaders in the oil and gas industries have told 
me privately that shareholder and public pres- 
sure, plus questions from their own children, 
have prompted them to shift their practices. 

Despite this, I posit that most people, 
including many of those attending the Davos 
meeting, still harbour the view that it is impos- 
sible to truly transform our economy in one 
decade. We cannot afford such fatalism. Swift 
change has happened before, and without 
being driven by planetary necessity: the global 
Internet is just 30 years old. 

If we can see where we are going — a future 
in which humanity does what is necessary to 
preserve the planet as we knowand love it — we 
will take faster, surer steps to get there. That 
visualization is all the more important because 
how we are going to get to this future will feel 
unfamiliar. The transition of technologies and 
systems in music and information makes sense 
only because we have seen vinyl records yield 
to streaming services and paper superseded by 
mobile multimedia. We must be ready to shape 
the necessary transition for energy, transport 
and more. And we must understand that this 
transition will be driven collectively. 

The global economy is a huge, complex 
system. As I learnt during my stewardship of 
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the Paris agreement, if you donot control the 
complex landscape of a challenge (and you 
rarely do), the most powerful thing you can do 
isto change how you behave in that landscape, 
using yourself as a catalyst for overall change. 

Imagine a person who wants to run a 
marathon and then concentrates on the fact 
that they can’t yet even runa mile: they begin 
to close the space of possibility. But, if that 
person adopts a different mindset, commits 
to a training schedule and visualizes passing 
the finish line, their goal is much more likely 
to be achieved. 

To all the people gathering in Davos, and 
all those watching from the outside, I urge 
you to move firmly into a state of stubborn 
optimism. The Anthropocene, the proposed 
geological age we now live in, does not need 
to go downin history as the age characterized 
by human-induced destruction. It can be the 
time when we rewrite our expected future for 
a better one: we still hold the pen. We must 
conceive of success and take immediate steps 
towards it. 


The author 


Christiana Figueres was the executive 
secretary of the United Nations Framework 
Convention on Climate Change from 2010 to 
2016. She is a co-founder of Global Optimism, 
an enterprise that aims to stimulate social 
and environmental change, and co-author of 
the forthcoming book The Future We Choose. 
e-mail: cfigueres@mission2020.global 
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Readers respond 


Correspondence 


Curb spread of virus 
emerging in China 


lapplaud Chinese colleagues’ 
prompt release of the 

genome sequence of the virus 
responsible for the mystery 
respiratory illness in Wuhanin 
central China (see Nature http:// 
doi.org/djhc; 2020). The agent 
is a previously unknown type 

of coronavirus that is distantly 
related to the severe acute 
respiratory syndrome (SARS) 
coronavirus. To curb the spread 
of the virus, its animal reservoir 
must be quickly identified and 
human-to-human transmission 
thoroughly investigated (see 
also go.nature.com/2ua489i). 

The authorities have been 
understandably cautious after 
the early misidentification of 
the SARS pathogen in 2003. 
However, the results of animal 
testing from a seafood market 
in Wuhan, where the virus 
was initially isolated, must be 
released as soon as possible. 
The virology community also 
feels that human-to-human 
transmission should not be 
ruled out without compelling 
evidence. 

This information is 
particularly crucial because 
tens of millions of people will 
be travelling — and consuming 
potentially contaminated 
animal meat — to celebrate 
the Chinese New Year on 25 
January. The public needs clear 
instructions and guidance. 

Controlling the spread of 
emerging and re-emerging 
viruses calls for international 
efforts. China’s research 
collaborations and data-sharing 
must continue — including with 
the United States, despite other 
problems with their relations. 


Shan-Lu Liu The Ohio State 
University, Columbus, Ohio, USA. 
liu.6244@osu.edu 


Grants: don’t 
leave it to luck 


Iwas shocked to read that a 
growing number of funding 
bodies are assigning research 
grants randomly (Nature 575, 
574-575; 2019). As an early- 
career researcher, I might be 
expected to gain from sucha 
system, given that I could landa 
windfall without having my case 
judged against the competition. 
But I want my career to be built 
on achievement, as recognized 
and promoted through 
conventional grant awards — not 
undermined by a lottery system. 

Some researchers might 
see random funding as more 
time-efficient, because it 
dispenses with the review 
process. It spares reviewers 
the burden of differentiating 
between the lowest-ranked 
successful candidate and the 
highest-ranked candidate who 
didn’t make the cut. However, 
for a researcher just starting 
out, a positive review based on 
the applicant’s contributions 
to the literature and other 
scientific merits is crucial for 
advancement. 

And if lottery-based grants 
become widespread, academic 
research will suffer as fruitful 
ideas are arbitrarily stalled. 
Leaving success up to lady luck 
isnot asolution. 


Howard Vindin University of 
Sydney, Australia. 
hvin6646@uni.sydney.edu.au 
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Grants: lottery 
is laziness 


The idea of a funding lottery 
(Nature 575, 574-575; 2019) is, in 
my view, a classic bureaucratic 
response toa process that 
bureaucracy finds too hard 

to handle. 

The review of scientific grant 
applications depends on an 
assessment of their quality, 
requiring a strict combination 
of evidence and intellectual 
judgement. Stuff that, say the 
bureaucrats. “Let’s make it a 
lottery, and save ourselves 
time and money.” Sure, some 
applications might flourish 
that otherwise would not, 
but what about the high- 
quality research that has been 
carefully constructed over time 
and is suddenly de-funded? 
Such a funding system is, in 
effect, anti-intellectual. It isa 
research version of publication 
bibliometrics that focus merely 
on citation counts, not on 
quality. 

Academia must resist 
this bureaucratization of 
research and publishing by 
well-meaning but scientifically 
inept bureaucrats. Otherwise, 
science itself stands to be 
plunged into the same miasma 
of metrics and bureaucracy- 
benefiting processes that have 
already weakened other great 
institutions, many examples 
of which are described in Jerry 
Muller’s book The Tyranny of 
Metrics (see Nature 554, 167; 
2018). 


Climate change: be 
mindful at meetings 


Scientists are keen to lower 
the toll their work takes on 
the planet (see, for example, 
O. Hamant etal. Nature 

573, 451-452; 2019). Ata 
recent Harvard conference 
on sociology and climate 
change, Hannah Holleman —a 
sociologist at Amherst College 
in Massachusetts — offered 

us agentle reminder of how 
our research is embedded 

in everyday practices (see 
go.nature.com/3acmulr). 

In her memorable opening 
statement, Holleman drew 
attention to the debt we owe 
to the native peoples whose 
traditional homelands are now 
occupied by the university, the 
natural resources used to build 
the venue, the production of 
sustenance for the event, and 
the fossil fuel needed for us to 
convene. She pointed out that 
the organic materials used would 
return, as waste, to the land. 

This unusual opening to an 
academic discussion landed 
a strong emotional punch. It 
was a powerful reminder — 
even for scholars who are 
well informed and deeply 
committed to solving the 
biodiversity and climate crises — 
of our shared responsibility 
and accountability. It used 
mindfulness as a way to amplify 
the urgency of that message. 
This approach could bear 
further exploration at other 
meetings on climate change. 
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Figure 1 | The Ganges river delta. 


Hydrology 


The changing 


shapes of river deltas 


Nick van de Giesen 


A model has been devised that quantitatively describes 

how the shape of a river delta is affected by sediments, tides 
and waves. It reveals that the area of delta land is increasing 
globally, as aresult of human activities upstream. See p.514 


Undisturbed river deltas are diverse 
ecosystems that encompass tidal wetlands 
and floodplains. Because of their rich soils 
and convenient positions for trade and trans- 
port, many deltas have also become hotspots 
of socio-economic development. The Nile 
delta, for example, with its iconic triangular 


shape, has been one such locus for more than 
5,000 years. Not all deltas are triangular, how- 
ever — their morphology can vary widely. On 
page 514, Nienhuis et al.’ report a model that 
correlates the forces that shape deltas with 
delta morphology, and use it to analyse the 
shapes of some 11,000 coastal deltas. This 
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global overview allows the authors to assess 
how delta morphology is affected by changes 
insediment delivery caused by river damming 
and soil erosion. 

The authors’ model estimates delta 
morphology on the basis of a quantitative 
characterization of three main drivers that 
shape deltas. These are: sediment delivered 
by the river; wave action that redistributes 
sediment along the coast; and sediment trans- 
ported into or out of the delta by tidal flows. 
The relative influences of these drivers were 
used to determine two key morphological met- 
rics; namely, the protrusion of the delta into 
the sea and the shape of the river channel. For 
example, Nienhuis et al. infer from the model 
that when the effects of sediment delivered by 
the river are greater than the effects of wave 
action, deltas protrude relatively far into the 
sea. Alternatively, the authors conclude that 
deltas widen towards the sea into a trumpet 
shape when tidal flows are important and 
sediment delivery is low. Nienhuis et al. vali- 
dated their model by comparing the projected 
morphologies with those of real deltas, and 
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provide robust statistics on the reliability of 
the results, whichis a key strength of the study. 

Note that the authors’ definition of what 
constitutes a delta is broad (see the Methods 
section of the paper for the criteria used), 
which means that their model is truly global. 
However, the model’s ability to capture the 
general behaviour of all deltas comes at the 
expense of fine-grained accuracy — there 
will almost inevitably be errors in the mor- 
phologies projected for some individual 
deltas. Nevertheless, the model’s results are 
statistically valid at a global level. 

Nienhuis and colleagues used their model 
to estimate the effects of upstream human 
interventions on delta morphology dur- 
ing the period 1985-2015. They found that 
dam building led to decreases in sediment 
delivery, whereas accelerated soil erosion 
caused by deforestation increased sediment 
delivery. Of the approximately 11,000 deltas 
analysed, about 9% are significantly affected 
by reduced sediment delivery, producing a 
total land loss of 127 square kilometres per 
year, whereas about 14% received increased 
sediment, causing a total gain of 181 km’ yr 
during the study period. The reason more 
deltas have experienced an increase in sedi- 
ment delivery, rather than a decrease, is simply 
that the effects of massive deforestation have 
outpaced sediment trapping by dams. 

Previously reported state-of-the-art stud- 
ies” of global coastal morphology involved 
the computationally intensive analysis of 
extremely large archives of satellite images, 
which have become available in the past 
few years. These studies also revealed a net 
increase in land surface area. Many of the 
land gains could be explained by large-scale 
phenomena, such as the disappearance of 
the Aral Sea in central Asia, and by extensive 
land-reclamation projects along the China 
coast. But beyond those special cases, it is 
also crucial to learn in greater detail where 
and why river deltas have gained or lost land 
across the globe. Nienhuis et al. fill in this key 
part of the puzzle. 

The newstudy also reveals notable regional 
patterns. For example, arctic river deltas have 
seen almost no change in morphology. Sedi- 
ment delivery by rivers in North America has 
fallen overall, leading to large land losses — in 
the Mississippi delta, for example. And the 
largest land gains are in eastern South Amer- 
ica and in south, southeast and east Asia, 
where soil erosion due to deforestation has 
caused a net growth in delta areas, despite the 
construction of sizeable dams in these regions. 

Large deltas, suchasthose of the Niger, Huang 
He and Mekong, have great socio-economic 
value. Such densely inhabited deltas typi- 
cally experience many pressures in addition 
to changes in sediment delivery, such as 
stresses associated with groundwater pump- 
ing, sand mining, dyke construction and loss 


474 | Nature | Vol577 | 23 January 2020 


of biodiversity* °. For these highly complex 
deltaic systems, local studies will be needed 
to assess the problems that adversely affect 
their morphology and to define specific solu- 
tions®. However, most of the deltas considered 
by Nienhuis and co-workers are much smaller. 
This could skew the picture painted by the 
overall numerical results, because large del- 
tas have a much greater global impact than 
do small ones, but represent a tiny fraction 
of the total number of deltas analysed in the 
study. For example, the study calculates that 
the net land gain for all deltas was 54 km? yr 
during the period studied, which seems like 
good news. But this area is tiny compared with 
the 105,000 km’ covered by the Ganges delta 
alone (Fig. 1) — which, with its population of 
170 million people, is subject to a multitude 
of stresses’. We should therefore not be 
complacent about the new findings. 

Nienhuis et al. did not include sea-level rise 
in their model, but sea levels rose by about 
10 cm over the period studied (see go.nature. 
com/2tpjpxg). This will probably not have pro- 
duced observable losses of delta land, given 
the large spatial variability of sea-level rises. 
Nevertheless, it would be interesting to see 
whether measurable losses did occur. The 
authors’ model provides a useful description 
of the background dynamics of changes in 
delta morphology against which the impact 
of rising seas can be measured once sea lev- 
els approach predicted increases of 60 cm 
(ref. 8) or more’, asa result of global warming. 
Severe sea-level rise will undoubtedly cause 
coastline recession in deltas, as it has in the 
geological past". 


Cancer immunology 


Validated global models describing key 
parts of the Earth system are crucial in this 
time of unprecedented human-induced 
climate change. Deltas connect the terrestrial 
and maritime branches of the hydrological 
cycle and the associated sediment fluxes. As 
such, they encapsulate many key indicators 
of global change. By accounting for the base- 
line effects on deltas of human activities such 
as dam building and deforestation, Nienhuis 
and colleagues have provided a fundamen- 
tal framework that will help assessments of 
the impacts of climate change for decades 
tocome. 
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B cells to the forefront 
of immunotherapy 


Tullia C. Bruno 


Three studies reveal that the presence in tumours of two key 
immune components — B cells and tertiary lymphoid structures 
—is associated with favourable outcomes when individuals 
undergo immunotherapy. See p.549, p.556 & p.561 


Current immunotherapies aim to reinvigorate 
immune cells called killer T cells to fight 
cancer, but only 20% of individuals with the 
disease see a lasting clinical benefit from 
this type of treatment’. Focusing on other 
immune cells in patients’ tumours might help 
us to improve these outcomes. Three studies, 
by Cabrita et al.” (page 561), Petitprez et al. 
(page 556) and Helmink et al.* (page 549), 
now demonstrate that the presence of B cells 


© 2020 Springer Nature Limited. All rights reserved. 


in human tumours in compartments called 
tertiary lymphoid structures (TLS) is associ- 
ated with a favourable response to immuno- 
therapy. These complementary studies add 
to the immunotherapy toolbox by providing 
new ways of predicting prognosis. 

The presence of B cells intumours has been 
considered to be a predictor of increased 
patient survival>*, but there are reports of 
both anti- and pro-tumour roles for B cells’. 


https://doi.org/10.1038/d41586-019-03943-0 
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B cells to the forefront 
of immunotherapy 


Tullia C. Bruno 


Three studies reveal that the presence in tumours of two key 
immune components — B cells and tertiary lymphoid structures 
— is associated with favourable outcomes when individuals 


undergo immunotherapy. 


Current immunotherapies aim to reinvigor- 
ate immune cells called killer T cells to fight 
cancer, but only 20% of individuals with the 
disease see a lasting clinical benefit from 
this type of treatment’. Focusing on other 
immune cells in patients’ tumours might help 
us to improve these outcomes. Three studies 
in Nature, by Cabrita et al.’, Petitprez et al. 
and Helmink et al.*, now demonstrate that 
the presence of B cells in human tumours in 
compartments called tertiary lymphoid struc- 
tures (TLS) is associated with a favourable 
response to immunotherapy. These comple- 
mentary studies add to the immunotherapy 
toolbox by providing new ways of predicting 
prognosis. 

The presence of B cells intumours has been 
considered to be a predictor of increased 
patient survival>*, but there are reports of 
both anti- and pro-tumour roles for B cells’. 
These differing reports reflect the multiple 
roles that B cells can have in tumours. One 
component of the antitumour function 
of B cells is B-cell activation. This process 
involves the binding of tumour-derived 
proteins to the B-cell receptor protein on 
the cell surface and the subsequent process- 
ing of these tumour-derived proteins into 
smaller fragments called antigens. Further 
co-factors are also involved in activation. 
Activated B cells can release antibodies that 
tag tumour cells for attack by other cellular 
players of the immune system (a process 
known as antibody-dependent cell death)’, 
and can ‘educate’ T cells by presenting them 
with tumour antigens, enabling the T cells 
to target tumour cells effectively’. However, 
B cells in tumours can produce inhibitory 
factors that hinder the function of immune 
cells (Fig. 1). These might be signalling molec- 
ules that suppress the immune system”? or 


inhibitory molecules onthe surfaces of B cells 
that limit the body’s ability to target and kill 
tumour cells. 

TLS are aggregates of immune cells (mostly 
T and B cells) that arise in response to immu- 
nological stimuli. Mature TLS nurture B-cell 
development and function in an inner region 
of the structure called the germinal centre, 
whereas immature TLS do not contain proper 
germinal centres, and might not nurture 
full B-cell function. The presence of TLS 
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in atumour also correlates with increased 
patient survival in many cancer types”. The 
three current studies confirm this trend in the 
context of immunotherapy, demonstrating 
that infiltration of B cells into atumour, along 
with the presence of TLS, is associated with an 
improved response to this type of treatment. 

Cabrita et al. studied individuals who hada 
type of cancer called metastatic melanoma, 
and Petitprez et al. investigated people with 
sarcoma, a cancer of the bone. Both teams 
found that the presence of B cells in TLS in 
the tumour before treatment was associ- 
ated with an increased chance that patients’ 
tumours would respond to immunotherapy. 
Helmink et al. corroborated these findings 
for metastatic melanoma, and reported the 
same pretreatment trend in renal cell carci- 
noma. These authors also demonstrated that, 
during treatment, TLS are more prevalent in 
people who have tumours that are responding 
to treatment than in those whose tumours are 
not. This timing is important — when present 
before treatment, TLS could be considered 
a predictor of patient response to immuno- 
therapy, whereas the presence of TLS during 
treatment indicates that key combinations 
of immune cells are being manipulated to 
induce TLS formation. Identifying these cell 
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Figure 1| Multifaceted B cells in the tumour microenvironment. B cells are thought to have multiple 
roles in suppressing or promoting the immune system’s ability to kill tumour cells, depending on whether 
they are located in immature or mature compartments called tertiary lymphoid structures (TLS), which 
also contain T cells. a, In poorly structured, immature TLS, one hypothesis is that B cells generate inhibitory 
factors. These might be molecules released from B cells that dampen the response of other immune cells, or 
molecules on the surfaces of B cells that hinder the targeting and destruction of tumour cells. Both of these 
inhibitory mechanisms might arise if B cells have less interaction with T cells and more interaction with the 
malignant tumour. Three studies? * now provide indirect evidence that immature TLS are associated with 
low activity of T cells in tumours. b, By contrast, B cells in well-structured, mature TLS can release antibodies 
that could target tumours, and B cells can present a tumour-derived protein called an antigen (yellow) to 

T cells inthe tumour, activating the T cells. The studies suggest that the presence of B cells in mature TLS is 
correlated with increased T-cell activity, improving the immune system’s ability to target tumour cells, and 
increasing the likelihood that the tumour will respond to immunotherapy. 
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combinations could help in establishing new 
and effective immune-based therapies. 

The three groups found that the B-cell and 
TLS signature was often more pronounced 
in responders than in non-responders. Fur- 
thermore, the signature was more prominent 
than typical T-cell signatures currently used 
for understanding immunotherapy outcomes. 
This suggests that B cells and TLS could have 
a key role in antitumour immunity. 

In addition to these synergistic results, 
each study highlights a unique role for B cells 
or TLSin antitumour immunity. First, Cabrita 
et al. demonstrate that B cells in TLS syner- 
gize with killer T cells that could ultimately 
target tumour cells. Second, Petitprez et al. 
describe signatures characteristic of mature 
TLS in sarcoma. This implies that mature TLS 
can exist in tumour sites that are not normally 
thought to be infiltrated by immune cells, a 
phenomenon that has not previously been 
shown. Third, Helmink et al. find increased 
diversity of B-cell receptors in responders 
compared with non-responders. This indi- 
cates that pools of B cells in responders might 
have a greater ability to specifically recog- 
nize tumour antigens than do the B cells of 
non-responders. 

These papers are technologically savvy, 
use patient populations that are statistically 
robust and bring B cells and TLS to the fore- 
front of antitumour immunity. However, there 
is much still to learn. First, more emphasis 
should be placed on understanding how TLS 
formin tumours. It is clear that these struc- 
tures are variable, and can be immature or 
mature. What does this diversity mean for 
the function of B cells in TLS, and what causes 
the induction of one ‘flavour’ of TLS versus 
another? The contribution of environmental 
factors suchas smoking or viral and bacterial 
infections should be considered, along witha 
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person’s gender, age and tumour type. 

Researchers should also ask whether mature 
TLS could be routinely induced to form 
in tumours, to maximize B-cell immunity. 
Addressing this issue will require investi- 
gation of B cells and TLS in individuals who 
have not yet undergone treatment, as well as 
proper modelling of the humantumour micro- 
environment. Current evidence indicates that 
Bcells actually impede antitumour responses 
inmost mouse models of cancer” ». However, 
TLS formation is rare in these animals, anda 
lack of TLS might alter the fate and subsequent 
function of B cells. Indeed, more knowledge 
about B-cell function outside TLS is needed 
to provide a complete picture of Bcells inthe 
tumour microenvironment. 

There is still a need to define the full range 
of functions that B cells perform in tumours. 
In addition to their known roles in producing 
tumour-specific antibodies and presenting 
antigens®”, B cells are likely to have other 
functions — for instance, inducing anti- 
body-dependent cell death®. It will also be 
necessary to link these functions to specific 
B-cell types and to determine whether such 
cells are found inside or outside TLS. There 
are clear biomarkers for B-cell subsets, but 
linking these subsets to functions in human 
tumours would allow us to design treatments 
that optimize specific antitumour activities. 
Furthermore, this knowledge would help us to 
understand whether subsets of B cells perform 
separate tasks, or if there is crosstalk between 
subsets. For example, can the same B cell both 
produceatumour-specific antibody and pres- 
ent antigens to T cells? Some of these studies 
can be done in human tumours, but in-depth 
mechanistic studies will require physiologi- 
cally relevant models that contain naturally 
occurring TLS. 

With regard to clinical implications, the 


© 2020 Springer Nature Limited. All rights reserved. 


current studies suggest that therapeutics to 
enhance B-cell responses should be priori- 
tized as a complement to T-cell-mediated 
immunotherapies. Researchers should now 
ask whether B cells could be engineered to 
target specific tumour antigens, similar to 
current efforts to engineer antigen-targeting 
T cells. More generally, could immunothera- 
pies be improved by inducing B cells to form 
in TLS after a person has received T-cell-based 
immunotherapy? 

Overall, the current studies should act as a 
springboard for future mechanistic studies 
of B cells and TLS in cancer. Understanding 
how current therapies can be combined with 
approaches to harness B cells and TLS will 
be crucial for the development of effective 
B-cell-specific immunotherapies. 


Tullia C. Bruno is in the Department of 
Immunology, University of Pittsburgh, 
Pittsburgh, Pennsylvania 15215, USA, and at 
the UPMC Hillman Cancer Centre, Pittsburgh. 
e-mail: toruno@pitt.edu 
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that optimize specific antitumour activities. 
Furthermore, this knowledge would help us to 
understand whether subsets of B cells perform 
separate tasks, or ifthere is crosstalk between 
subsets. For example, can the same B cell both 
produceatumour-specific antibody and pres- 
ent antigens to T cells? Some of these studies 
can be done in human tumours, but in-depth 
mechanistic studies will require physiologi- 
cally relevant models that contain naturally 
occurring TLS. 

With regard to clinical implications, the 
current studies suggest that therapeutics to 
enhance B-cell responses should be priori- 
tized as a complement to T-cell-mediated 
immunotherapies. Researchers should now 
ask whether B cells could be engineered to 
target specific tumour antigens, similar to 
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current efforts to engineer antigen-targeting 
T cells. More generally, could immunothera- 
pies be improved by inducing B cells to form 
in TLS after a person has received T-cell-based 
immunotherapy? 

Overall, the current studies should act as a 
springboard for future mechanistic studies 
of B cells and TLS in cancer. Understanding 
how current therapies can be combined with 
approaches to harness B cells and TLS will 
be crucial for the development of effective 
B-cell-specific immunotherapies. 


Tullia C. Bruno is in the Department of 
Immunology, University of Pittsburgh, 
Pittsburgh, Pennsylvania 15215, USA, and at 
the UPMC Hillman Cancer Centre, Pittsburgh. 
e-mail: toruno@pitt.edu 


The population of large animals in the 
Gorongosa National Park collapsed 
during the Mozambican civil war 
(1977-92), and led to encroachment 
of the invasive shrub Mimosa pigra. 
Writing in Nature Ecology & Evolution, 
Guyton et al. report that Gorongosa’s 
repopulation with large herbivores has 
reduced the abundance of mimosa to 
pre-war levels (J. A. Guyton et al. Nature 
Ecol. Evol. http://doi.org/djff; 2020). 
By analysing faecal samples from 
Gorongosa’s five main ruminant 
herbivores, including waterbuck 
(Kobus ellipsiprymnus; pictured), the 
authors found that mimosa was the 
main component of the diets of these 
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species in 2013-18. They also found 
that the shrub’s density and biomass 
were greater in fenced enclosures that 
excluded herbivores than in unfenced 
areas. 

The authors therefore conclude 
that the burgeoning populations of 
native large herbivores are consuming 
mimosa, and have thereby conferred 
resistance to its invasion in just 
ten years. The findings suggest that 
rewilding is a potentially useful strategy 
for reversing a common form of 
environmental degradation in Africa’s 
protected areas. Andrew Mitchinson 
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Versatile strategy for 
making 2D materials 


Wei Sun Leong 


Two-dimensional materials have potential uses in flexible 

electronics, biosensors and water purification. A method for 
producing air-stable 2D materials on an industrial scale, now 
reported, is a key step in bringing them to market. See p.492 


Modern materials science relies on a deep 
understanding of defects — interruptions to 
regular atomic arrangements in crystalline 
solids. Although ‘defects’ brings to mind 
imperfections and blemishes, they often 
make a material more useful than it other- 
wise would be. For example, metal impurities 
such as chromium and iron atoms in corun- 
dum (a crystalline form of aluminium oxide) 
are responsible for the colours of rubies and 
sapphires. Moreover, the addition of impuri- 
ties to silicon has enabled the current era 
of computing and robotics. On page 492, 
Duetal.1report amethod for producing a vari- 
ety of technologically useful two-dimensional 
materials that contain deliberately introduced 
impurities, solving a fabrication problem for 
next-generation devices. 

Transition-metal chalcogenides (TMCs) are 
emerging materials that hold great promise for 
their incorporation into a wide range of appli- 
cations, from batteries and flexible electronics 
to biosensors and water-purification systems. 
They are composed of a transition metal such 
as molybdenum or tungsten and a chalco- 
gen (an element in group 16 of the periodic 
table) such as sulfur, selenium or tellurium. 
The properties of TMC monolayers change 
greatly if the metallic element is altered. In 
particular, these structures can change from 
being normal metals to semiconductors, or 
even superconductors. 

In the past few years, many researchers” * 
have focused on making ultrathin electronics 
that have superior properties to those of exist- 
ing silicon devices, by combining different 
TMC monolayers into a single object known 
as a heterostructure, using a technique called 
chemical-vapour deposition. Other research- 
ers’ have produced functional devices using 
a single TMC in which different regions of 
the material have different properties, such 
as being metallic or semiconducting. How- 
ever, although these techniques are good for 
fabricating prototype devices, they are not 
practical enough for real-world applications. 

The long-standing problem inincorporating 


TMC monolayers into a functional device has 
been the lack of a metallic-phase TMC mono- 
layer that is stable in ambient conditions for 
more than a month*®. Du and colleagues over- 
came this challenge, and made metallic-phase 
TMC monolayers that they show can exist in 
such conditions for about a year. The authors 
achieved this feat by introducing a technology 
based ona process known as doping. 

Doping has shaped the digital revolution 
— the shift from analog to digital electronics 
that began in the second half of the twentieth 
century. The process involves changing the 
electrical conductivities of semiconductors 
such as silicon by adding impurities. Eighty 
years ago’, dopant atoms of boron and phos- 
phorus were added to pure silicon to produce 
materials called p-type and n-type silicon, 
respectively; these form p-n junctions, the 
basis of computing. This doping technology 
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continues to be useful today, and is found in 
our everyday electronics. Du and co-workers’ 
doping technology for 2D materials is also 
expected to have a long-term impact on 
the field. 

The authors produced TMC monolayers 
in three steps (Fig. 1). First, they prepared a 
crystal that contained two different transition 
metals (one of which provided impurity atoms 
for TMC doping), an element in group 13 or 
14 of the periodic table, and carbon. Second, 
they heated the crystal at high temperatures 
(873-1,373 kelvin) for 4 hours in an environ- 
ment that contained two gases. One of these 
was a chalcogen-containing gas that supplied 
chalcogen atoms for the TMC; the other gas 
was phosphorus, which provided further 
impurity atoms for TMC doping. Third, the 
authors used a process called liquid exfoliation 
to convert the resulting TMC crystal into TMC 
monolayers in the form of liquid inks. 

Du et al. used this three-step dual-doping 
technology to make, for example, metal- 
lic-phase TMC monolayers of tungsten 
disulfide that were doped with both yttrium 
and phosphorus atoms. They also produced 
undoped TMC monolayers by preparing 
layered crystals that contained one type of 
transition metal, rather than two, and remov- 
ing the source of phosphorus gas. In total, the 
authors made six doped and seven undoped 
TMC monolayers, demonstrating the remarka- 
ble versatility of their approach for producing 
2D materials. 

One major advantage of Du and colleagues’ 
method is that the final 2D materials are inthe 
form of liquid inks. There is clearly a shift in this 
field towards making high-quality monolayer 
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Figure 1 | Method for producing air-stable transition-metal chalcogenides (TMCs). Du et al.’ demonstrate 
a technology for making monolayers of materials called TMCs that they show can remain stable in ambient 
conditions for about a year. They first prepare a crystal that contains two different transition metals, an 
element in group 13 or 14 of the periodic table, and carbon. They then place the crystal in a container and heat 
it ina furnace for 4 hours, in an environment containing two gases. One of the gases contains a chalcogen (an 
element in group 16 of the periodic table) and the other is phosphorus gas produced by heating phosphorus 
powder ina separate container in the furnace. The result of this process is a TMC crystal. Finally, the authors 
use a process called liquid exfoliation to convert the crystal into TMC monolayers in the form of liquid inks. 
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inks for commercialization®’, rather than films 
produced by techniques such as epitaxial 
growth or chemical-vapour deposition. Such 
films require a process knownas delamination 
to separate them from their growth substrates, 
which deteriorates the material’s quality and 
necessitates further processing”. By con- 
trast, monolayer inks can be readily deposited 
onarbitrary substrates using techniques such 
as inkjet printing or spin coating, and so are 
easily integrated into 3D systems”. 

From ascientific standpoint, 2D materials 
need to be stable and usable in our immedi- 
ate surroundings. Du and colleagues’ findings 
are promising for the field because they 
show that the presence of a low quantity 
(less than 1%) of impurity atoms can stabilize 
TMC monolayers. This result suggests that 
materials researchers should start to explore 
the use of chemical elements to stabilize 
2D materials that would otherwise degrade in 
ambient conditions within hours, rather than 
using encapsulation layers, which complicate 
the monolayer systems. 

The next steps will be for theorists to 
predict suitable ‘impurity stabilizers’ for TMC 
monolayers, and for experimentalists to inves- 
tigate the use of elements that are abundant 
on Earth. In the meantime, it should still be 
possible to build advanced machines for 
precise and reliable dual doping of TMCs, 
because only a low quantity of relatively rare 
yttrium and phosphorus is needed to stabi- 
lize TMC monolayers. Du and colleagues’ work 
demonstrates that, whatever new materials are 
discovered, it is crucial that we understand, 
manipulate and use their atomic-level defects. 
Every atom matters. 
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Meet the relatives of 
our cellular ancestor 


Christa Schleper & Filipa L. Sousa 


Microorganisms related to lineages of the Asgard archaea 
group are thought to have evolved into complex eukaryotic 
cells. Now the first Asgard archaeal species to be grown in the 
laboratory reveals its metabolism and cell biology. See p.519 


Complex life forms including plants, animals 
and fungi are known as eukaryotes. These 
organisms are composed of cells that contain 
membrane-bound internal compartments 
such as nuclei and other organelles. Imachi 
et al.' report on page 519 that a type of micro- 
organism called an Asgard archaeon, which 
might shed light on how early eukaryotic cells 
evolved, has finally been cultured in the labo- 
ratory. The achievement will enable detailed 
metabolic and cellular investigation of 
microbes that represent the closest Archaeal 
relative of eukaryotes cultured so far. 

Itis thought that eukaryotes arose when two 
types of single cell merged, with one engulfing 
the other. A cell from the domain archaea is 
proposed to have engulfed a bacterial cell of 
a type known as an alphaproteobacterium, 
and the engulfed bacterium evolved into 
eukaryotes’ energy-generating organelles — 
mitochondria. 

However, the nature of the ancestral cell that 
engulfed this bacterium is unclear. Genomic 
analyses have strengthened the idea that this 
cell traces back to archaea because many 
archaeal genes involved in central biological 
processes such as transcription, translation 
and DNA replication share a common ances- 
try with (are phylogenetically related to) the 
corresponding eukaryotic genes. Was the 
alphaproteobacterium engulfed by a bona fide 
archaeal cell, or by an archaeal cell that had 
already acquired some eukaryotic charac- 
teristics, such as a nucleus? No fossils have 
been found that could shed light on the early 
eukaryotic ancestors. However, investigation 
of archaeal lineages has offered a way forward. 

Since 2015, on the basis of genomic and 
phylogenetic analyses’, archaea of a newly dis- 
covered phylum termed Lokiarchaeota (after 
the Norse god Loki) have been proposed as the 
closest living relatives of the ancient archaeal 
host cells from which eukaryotes are thought 
to have evolved. Subsequent genomic research 
revealed yet more such lineages, for which 
other Norse gods have provided names (Thor, 
Odin, Heimdall and Hel)**, and which are now 
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grouped together with Lokiarchaeota into 
what are collectively termed Asgard archaea 
(Fig. 1). Intriguingly, all of these lineages con- 
tain an unprecedentedly large number of 
genes that encode what are called eukaryotic 
signature proteins (ESPs), which are usually 
found only ineukaryotes****. Heimdallarchae- 
ota currently represent the predicted closest 
Archaeal relative of eukaryotes on the basis 
of phylogenetic analysis and the ESP content 
of their genomes’. However, all members of 
the Asgard archaea were previously identi- 
fied, and their metabolism predicted, solely 
by their DNA sequences, and thus their cellular 
features have remained unknown until now. 

Imachiand colleagues report that they have 
cultured inthe laboratory an Asgard archaeon 
from the Lokiarchaeota phylum that they pro- 
pose to call ‘Prometheoarchaeum syntrophi- 
cum’, which was obtained from deep-ocean 
sediments. The unusual shape and metabolism 
of Prometheoarchaeum prompt the authors 
to propose a new model for the emergence 
of the first eukaryotic cell. This event, pre- 
dicted® to have occurred between 2 billion and 
1.8 billion years ago, is one of the key cellular 
transitions in evolutionary biology, and is also 
a major biological mystery. 

More than six years before Asgards were 
even identified, Imachi and colleagues had 
already started to generate enrichment 
cultures of microorganisms found in deep 
marine sediments”. Their original goal was to 
find organisms that could degrade methane, 
and the authors searched for such microbes 
at a site about 2.5 kilometres below the ocean 
surface off the coast of Japan. 

Imachi et al. set up a flow bioreactor device 
that mimicked the temperature (10 °C) and 
the low-oxygen and low-nutrient conditions 
at this underwater site. Within five years of 
starting this bioreactor work, a highly diverse 
consortium of active bacteria and archaea, 
including Lokiarchaeota, were obtained. 
Small subcultures were then used to grad- 
ually enrich for cultures in which archaeal 
cells were the dominant component, and 
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Microorganisms related to lineages of the Asgard archaea 
group are thought to have evolved into complex eukaryotic 
cells. Now the first Asgard archaeal species to be grown in the 
laboratory reveals its metabolism and cell biology. 


Complex life forms including plants, animals 
and fungi are known as eukaryotes. These 
organisms are composed of cells that contain 
membrane-bound internal compartments 
such as nuclei and other organelles. Writing 
in Nature, Imachi et al.‘ report that a type of 
microorganism called an Asgard archaeon, 
which might shed light on how early eukary- 
otic cells evolved, has finally been cultured in 
the laboratory. The achievement will enable 
detailed metabolic and cellular investiga- 
tion of microbes that represent the closest 
Archaeal relative of eukaryotes cultured so far. 

Itis thought that eukaryotes arose when two 
types of single cell merged, with one engulfing 
the other. A cell from the domain archaea is 
proposed to have engulfed a bacterial cell of 
a type known as an alphaproteobacterium, 
and the engulfed bacterium evolved into 
eukaryotes’ energy-generating organelles — 
mitochondria. 

However, the nature of the ancestral cell that 
engulfed this bacterium is unclear. Genomic 
analyses have strengthened the idea that this 
cell traces back to archaea because many 
archaeal genes involved in central biological 
processes such as transcription, translation 
and DNA replication share a common ances- 
try with (are phylogenetically related to) the 
corresponding eukaryotic genes. Was the 
alphaproteobacterium engulfed by a bona fide 
archaeal cell, or by an archaeal cell that had 
already acquired some eukaryotic charac- 
teristics, such as a nucleus? No fossils have 
been found that could shed light on the early 
eukaryotic ancestors. However, investigation 
of archaeal lineages has offered a way forward. 

Since 2015, on the basis of genomic and 
phylogenetic analyses’, archaea of anewly dis- 
covered phylum termed Lokiarchaeota (after 
the Norse god Loki) have been proposed as the 


closest living relatives of the ancient archaeal 
host cells from which eukaryotes are thought 
to have evolved. Subsequent genomic research 
revealed yet more such lineages, for which 
other Norse gods have provided names (Thor, 
Odin, Heimdall and Hel)*4, and which are now 
grouped together with Lokiarchaeota into 
what are collectively termed Asgard archaea 
(Fig. 1). Intriguingly, all of these lineages 
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contain an unprecedentedly large number of 
genes that encode what are called eukaryotic 
signature proteins (ESPs), which are usually 
found only in eukaryotes”*>*. Heimdallarchae- 
ota currently represent the predicted closest 
Archaeal relative of eukaryotes on the basis 
of phylogenetic analysis and the ESP content 
of their genomes’. However, all members of 
the Asgard archaea were previously identi- 
fied, and their metabolism predicted, solely 
by their DNA sequences, and thus their cellular 
features have remained unknown until now. 

Imachiand colleagues report that they have 
cultured inthe laboratory an Asgard archaeon 
from the Lokiarchaeota phylum that they pro- 
pose to call ‘Prometheoarchaeum syntrophi- 
cum’, which was obtained from deep-ocean 
sediments. The unusual shape and metabolism 
of Prometheoarchaeum prompt the authors 
to propose a new model for the emergence 
of the first eukaryotic cell. This event, pre- 
dicted® to have occurred between 2 billion and 
1.8 billion years ago, is one of the key cellular 
transitions in evolutionary biology, and is also 
a major biological mystery. 

More than six years before Asgards were 
even identified, Imachi and colleagues had 
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Figure 1 | The evolution of eukaryotic cells. Imachi et al.1 report that they have cultured a microorganism, 
which they call ‘Prometheoarchaeum syntrophicum’, in the laboratory. The microbe belongs to a group 
known as Asgard archaea. This is the first time that an Asgard archaeon has been cultured, and has 
revealed previously unknown aspects of its cellular biology, including the presence of long protrusions. 
This development might shed light on how complex eukaryotic cells evolved. a, It is thought that an 
ancient Asgard archaeon interacted with a bacterium from the class Alphaproteobacteria, for example 
by exchanging metabolite molecules (grey circles). The mitochondrion, the energy-generating organelle 
of eukaryote cells, is thought to have evolved when such a bacterium was taken up in the archaeal cell. 

b, This simplified evolutionary tree includes branches of the lineages (Proteobacteria shown in red and 
Asgard archaea in blue) that might have contributed to the formation of eukaryotic cells. Dashed lines 
onthe evolutionary trees represent lineages identified only by genomic analysis and not by organisms 
cultured in the laboratory. It is thought that eukaryotic cells evolved from a partnership between an 
alphaproteobacterium and a relative of a Heimdallarchaeote (neither of which is known). LUCA: the last 
universal common ancestor (the cell(s) from which bacteria and archaea evolved). 
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already started to generate enrichment 
cultures of microorganisms found in deep 
marine sediments’. Their original goal was to 
find organisms that could degrade methane, 
and the authors searched for such microbes 
at asite about 2.5 kilometres below the ocean 
surface off the coast of Japan. 

Imachietal. set up a flow bioreactor device 
that mimicked the temperature (10 °C) and 
the low-oxygen and low-nutrient conditions 
at this underwater site. Within five years of 
starting this bioreactor work, a highly diverse 
consortium of active bacteria and archaea, 
including Lokiarchaeota, were obtained. 
Small subcultures were then used to gradually 
enrich for cultures in which archaeal cells were 
the dominant component, and Prometheo- 
archaeum was successfully enriched in this 
way after seven more years of work. These opti- 
mizations revealed that Prometheoarchaeum 
grows best in conditions that do not directly 
reflect its original habitat: at 20 °C and supple- 
mented with amino acids, peptides and even 
baby-milk powder. 

The authors report that Prometheo- 
archaeum’s growth depends on the presence 
of other microbial partners that in turn rely 
onPrometheoarchaeum for their survival —a 
relationship called a syntrophy. The partners 
scavenge hydrogen released by Prometheo- 
archaeum, a metabolic product that was 
correctly predicted to be generated by Asgard 
archaea on the basis of genomic data>. The 
authors found that Prometheoarchaeum 
could be enriched to make up more than 80% 
of the cells inthe culture, even though it grows 
extremely slowly, taking 2 to 4 weeks to rep- 
licate and divide. From preliminary studies 
using isotope analysis, the authors report that 
this organism can degrade externally supplied 
amino acids. However, that does not exclude 
the possibility that it also thrives on other 
nutrients in the growth medium. 

Prometheoarchaeum cells are relatively 
small (00-750 nanometres in diameter), 
have lipids characteristic of other archaea, 
and shownoevidence for eukaryotic-like orga- 
nelles. However, the organism forms intriguing 
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structures on its cellular surface that include 
long and often branching protrusions. 

On the basis of its cell shape and small size, 
and on evidence that Prometheoarchaeum 
produces and syntrophically transfers 
hydrogen and formate molecules to other 
organisms, the authors propose anew model 
for the emergence of eukaryotic cells — one 
involving three partners. In this model, a 
free-living bacterial ancestor that would give 
rise to mitochondria became entangled with, 
and was then engulfed by, an archaeal host cell 
that itself was in asyntrophic relationship with 
a bacterial partner. 

This model is consistent with earlier 
suggestions about the engulfment process 
in eukaryotic evolution”, and emphasizes 
the importance of membrane-mediated 
processes in the origin of eukaryotes”. How- 
ever, extensive cellular protrusions are not 
found exclusively in this Asgard archaeon. 
It would therefore be of interest to investi- 
gate to what extent these protrusions differ 
from those of branched cellular extensions 
previously observed in other archaea such as 
Pyrodictium” or Thermococcus species”. In 
addition, it will be interesting to determine 
whether the ESPs potentially involved in 
membrane remodelling are localized in these 
structures in Prometheoarchaeum. 

The syntrophic interactions that Imachi 
and colleagues propose in their model for 
the origin of mitochondria are based on the 
need for the host cell to adapt to oxygen use 
(as a consequence of rising oxygen levels on 
the ancient Earth). These ideas differ from 
the ‘reverse hydrogen flow’ model, which 
suggests instead that hydrogen produced 
by the archaeon is consumed directly by the 
bacterial mitochondrial ancestor, with no 
need to invoke a hypothetical third partner°. 
Considering that Prometheoarchaeum does 
not directly represent the archaeal ancestor 
of eukaryotes (nor does any other currently 
existing archaeon), other suggested meta- 
bolic exchanges between the archaeal host 
and bacterial mitochondrial ancestor, suchas 
hydrogen consumption from the archaeal*5 
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or the bacterial side’, remain plausible as 
initial drivers of a syntrophic relationship. In 
any case, the many models for the origin of 
eukaryotes“ highlight the importance 
of initial syntrophic associations** and mem- 
brane-mediated processes”. Interestingly, 
albeit for different reasons, both syntrophy 
and membranes were crucial aspects in an 
engineered synthetic relationship in which 
an Escherichia coli bacterium was maintained 
inside a yeast cell for more than 120 days”. 

Imachiand colleagues’ success in culturing 
Prometheoarchaeum after efforts spanning 
more than a decade represents a huge break- 
through for microbiology. It sets the stage 
for the use of molecular and imaging tech- 
niques to further elucidate the metabolism 
of Prometheoarchaeum and the role of ESPs in 
archaeal cell biology. This, inturn, could guide 
the direction of future work investigating how 
eukaryotic cells emerged. 
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Self-organized criticality is an elegant explanation of how complex structures emerge 


and persist throughout nature’, and why such structures often exhibit similar scale- 
invariant properties” °. Although self-organized criticality is sometimes captured by 
simple models that feature a critical point as an attractor for the dynamics” », the 


connection to real-world systems is exceptionally hard to test quantitatively 


16-21 


Here we observe three key signatures of self-organized criticality inthe dynamics of a 
driven-dissipative gas of ultracold potassium atoms: self-organization to a stationary 
state that is largely independent of the initial conditions; scale-invariance of the final 
density characterized by a unique scaling function; and large fluctuations of the 
number of excited atoms (avalanches) obeying a characteristic power-law 
distribution. This work establishes a well-controlled platform for investigating 
self-organization phenomena and non-equilibrium criticality, with experimental 
access to the underlying microscopic details of the system. 


Self-organized criticality (SOC) was conceptualized as a way to explain 
the abundance of scale-invariant systems found in nature’. It is thought 
to underlie a range of complex dynamical phenomena, from activ- 
ity in electrical circuits and neural networks”’, to the likelihood of 
avalanches and earthquakes’ as well as how forest fires*"°, diseases’? and 
even ideas spread®. However, despite the fundamental and practical 
importance of SOC phenomena, much-needed controlled experiments 
are hindered by numerous complexities concerning the relevant micro- 
scopic degrees of freedom’®” and even the simplest models (beyond 
mean-field approximations) present serious challenges to theory” »”°. 
SOC can be understood as an organizing principle that governs a 
class of dissipative interacting systems that display three key signa- 
tures: (1) self-organization to a stationary state (bringing observables 
to values that are independent of initial conditions); (2) scale invariance 
of spatio-temporal correlation functions, including bulk observables; 
and (3) critical response to small perturbations, usually encountered in 
the form of avalanches that have a broad range of sizes and durations 
and that are described by power-law distributions. This differs from 
an equilibrium phase transition, where scale invariance and a critical 
response ensue only for a fine-tuned parameter set. The common root 
of these emergent SOC properties is that the respective gap (that is, 
the distance in parameter space from the critical state) is replaced 
by a ‘dynamical gap’ that self-tunes to zero by an intrinsic feedback 
mechanism. This property, and signatures (1)—-(3), set SOC apart from 
other occurrences of non-equilibrium scaling behaviour—such as 
hydrodynamic long-time tails”, the Kosterlitz-Thouless critical phase 
in two-dimensional quantum fluids” and the transient dynamics of 
turbulent cascades in isolated systems”*”>—which have also been 
studied with ultracold atoms**™; see also related experiments on 
superradiance*””® and scaling in unitary Bose gases. 


In this work, we demonstrate signatures (1)—(3) of SOC in a micro- 
scopically well-controlled physical system: a three-dimensional trapped 
gas of ultracold potassium atoms driven to highly excited Rydberg 
states by a laser field (Fig. 1a). The ingredient that leads to SOC is the 
slow, irreversible decay of the excited population to auxiliary inac- 
tive states, which has been largely disregarded in the investigation of 
Rydberg many-body dynamics. This enables the observation of a phase 
transition from a self-organizing active phase to an absorbing phase, 
scale-invariance of the self-organized density and large fluctuations 
of the active density in the form of power-law distributed avalanches. 
Beyond these experimental results, we derive a Langevin equation from 
the underlying microscopic many-body quantum master equation that 
governs driven-dissipative Rydberg dynamics, which coincides with 
one of the emblematic classes of SOC models””. This provides the crucial 
link from the microscopic atomic physics to the observed macroscopic 
SOC phenomenology, and establishes ultracold Rydberg atomic gases 
as a widely tunable and theoretically accessible platform for studying 
self-organization and universality in non-equilibrium dynamics. 


Physical system 

Each of the approximately 10° atoms held in the optical trap can be 
represented by a three state system: a ground state |g) = |45,.,F=1), an 
excited Rydberg state |r), and auxiliary removed states, which we refer 
to collectively by |O) (Fig. 1b). The laser field drives the |g) > |) transi- 
tion with a fixed detuning A from resonance and with an amplitude 
parameterized by the Rabi frequency 2. In our experiments A > Q, such 
that single-atom excitation processes are strongly suppressed. Once 
excited, however, atoms can facilitate further excitations (when the 
laser detuning is compensated by the interaction energy of Rydberg 
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pair states) leading to the formation of extended excitation clusters “*. 


Alternatively, they can be lost from the system, predominantly by 
spontaneously decaying to another hyperfine ground state or to other 
untrapped states. 

This system permits a microscopic description via a quantum mas- 
ter equation for the many-body density matrix 6, 


ab= 5 10.A+ 5 C10) () 


with the atom-light interaction Hamiltonian AH and Lindblad 
superoperator L,(p) given by 
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where oe =|a)(Bl,, LU are indices for each atom and r, is the relative 
distance between any two atoms. Interactions between Rydberg states 
are parameterized by the van der Waals coefficients C,/2m~0.52GHzpm° 
for Ir) = |39p3,), and C,/21 ~ 238 GHz pm’ for |r) = |66p;,.). Dissipation 
is described by £,(6), which includes decay (with total rate 1) and 
dephasing (rate y,,) attributed primarily to residual laser phase noise 
and Doppler broadening. 

To connect the microscopic dynamics of Rydberg atoms (equation 
(1)) tothe SOC phenomenology, we apply a systematic coarse graining 
procedure for the collective dynamics (derived inthe Supplementary 
Information). In brief, we average over the characteristic length scale of 
the facilitation process and adiabatically eliminate the rapidly decay- 
ing atomic coherences*“*. We also approximate the atomic medium 
as a quasi-homogeneous gas with a smoothly varying density, whichis 
justified by the fact that the atoms move ona timescale considerably 
shorter than the SOC dynamics. The final result is a Langevin equation 
for the spatio-temporal density of atoms in the |r) state, p,= p(t, r), 
(the active component) and the total remaining density n,=n(t, rn), 
which is the sum of the populations in the |g) and |r) states (excluding 
removed states): 


0,p, = (DV? - + Kn,)p, - 2x p? + t(n,- 2p,) + €, (2) 


t t 
n.=Nno- br f dt’p,, +Drf, dt’V?n,, (3) 


In these equations, D and D; are diffusion constants and x is the 
facilitation rate (which together govern the rate of excitation spread- 
ing), Tis the spontaneous excitation rate, nj is the initial density, 
and bisa dimensionless parameter that governs how fast the decay 
depletes the total population. The stochastic part of the evolution is 
governed by the autocorrelated multiplicative noise term €,= €(¢, r) 
with variance var(€,) = /p,. 

Equations (2) and (3) closely relate to the paradigmatic Drossel- 
Schwabl forest fire model”°, except for the absence of aslowregrowth 
term for the total density, which would normally bring the system from 
an inactive (subcritical) state to the critical state. This regrowth is typi- 
cally the slowest scale in the model, and must asymptotically vanish 
in order to realize SOC. Nevertheless, in its absence the system still 
exhibits anon-equilibrium phase transition”, which can be approached 
by starting in the active phase. To illustrate this we present numerical 
simulations (Fig. Ic, d), for simplicity focusing on a small one-dimen- 
sional system. In the case b = T= O, the system features a non-equilib- 
rium phase transition from an absorbing phase, in which any excited 
component quickly dies out (characterized by p,,..> 0 for Kno <I), to 
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Fig.1|SOCin an ultracold atomic gas excited to Rydberg states by a laser 
field. a, Self-organization process ina cigar-shaped atom cloud showing atoms 
inthe ground state |g) (blue dots) or excited toa Rydberg state |r) (large red 
spheres) via facilitated excitation processes, leading to the buildup of 
correlations (represented by red links). b, The laser field couples the |g) > |r) 
transition with Rabi frequency Q and detuning A, and atoms in the |r) state 
either decay to removed states |O) (black circles) or facilitate further Rydberg 
excitations. These microscopic processes determine the couplings inthe 
Langevin equation (equations (2) and (3)) defined in the text (green arrows). 
c, Numerical solution of equations (2) and (3) for the population conserving 
system b=0 (in one dimension) with D =1 (discretization distance =1), D,;=0, 
=10,x=10 and t=0. Asa function of the total density no, the stationary active 
density p,,.. exhibits an absorbing state phase transition (dotted vertical line), 
which acts as an attractor for the SOC dynamics (when b# 0). d, Time evolution 
for b=0.01 showing the spatially averaged active density (p,) (orange) and total 
density {n,) (blue) as the system approaches a stationary state close to the 
critical point of the absorbing state phase transition. The lower panelind 
shows the full spatio-temporal evolution of the active density p, with transverse 
coordinate x spanning 128 grid points. 


an active phase in which excitations spread throughout the system 
from arbitrarily small seed excitations (with 9,,..> 0 for kn) >/).Onthe 
other hand, when b, t#0, spontaneous single-atom excitations trigger 
the relatively fast facilitated excitation dynamics, although on longer 
timescales particle loss introduces a coupling between p,and n,. Specifi- 
cally, the first integral in equation (3) acts as a feedback mechanism, 
causing n,to continuously decrease while in the active phase. When 
this loss is much slower than the internal dynamics but much faster 
than the spontaneous excitation rate (achieved for kno => bl >), 
the system slowly approaches the critical point of the absorbing-state 
phase transition and develops scale-invariant properties, visualized for 
example by growing spatio-temporal correlations in the active density 
(the fractal-like structures seen around t = 80 ms in the lower panel 
of Fig. 1d). This behaviour can be understood in terms of the evolu- 
tion of the dynamical gap xn, — I, which is initially positive and then 
continuously decreases— owing to population loss—until it asymptoti- 
cally reaches zero at the critical point, where the dynamics effectively 
stop. 


Initial density, ny 
# 0.172 um? 


# 0.153 um 

0.14 = 0.115 um 

= 0.096 um 

ge 0.12 2 
= = 0.081 um 

= 0.10 ¥ 0.056 um 
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Fig. 2 | Self-organization: above a threshold value, the remaining total atom 
density n,is attracted to the same stationary-state density independent of 
the initial conditions. The Rydberg state used is 39p;,. and the parameters of 
the driving laser field are A/2m7=30 MHz, Q/27=190 kHz. For high initial 
densities ny 2 0.08 ym™ the time dependence consists of a short initial plateau 
followed by fast exponential decay toa stationary state with a fixed density 
n;= 0.075 um °°. For initial densities belown, the dynamics is effectively 
stationary (black points). The solid lines correspond to mean-field solutions to 
the effective Langevin equation with parameters given in the text. Each data 
point is the average of three measurements. Standard errors for each dataset 
are indicated by the representative error bars shown in the key. 


Theoretically solving the full dynamical many-body problem 
described by equations (2) and (3) beyond the mean-field level is dif- 
ficult, particularly for large system sizes and in more than one spa- 
tial dimension, owing to the presence of multiplicative noise and the 
importance of strong spatio-temporal correlations". As a result, many 
properties of this class of systems are still actively debated, such as the 
question of whether the system self-organizes towards a truly criti- 
cal state, and whether it fulfils the universal scaling relations that are 
conjectured for SOC””°. In particular, the non-equilibrium critical 
exponents for the model described by equations (2) and (3) have not 
been reliably determined beyond the mean-field level, except in some 
limiting cases. For example, for b=T=0 (the number-conserving limit) 
the critical behaviour is governed by a critical point in the directed 
percolation universality class®**, How this universality changes inthe 
non-commuting limit of small but non-vanishing bis not conclusively 
understood””®, but we expect the universality to be strongly modified, 
because in a renormalization group picture, the fully attractive SOC 
fixed point does not feature an obvious relevant direction, as is the case 
for directed percolation. In what follows, we experimentally implement 
this elusive model and provide a first experimental characterization 
of some ofits scale-invariant properties as the system approaches the 
non-equilibrium critical point. 


Self-organization mechanism and model verification 


We start our experiments by investigating the full time evolution of the 
total remaining density for different initial states. For this we prepare 
a gas of atoms in the ground state (, = 0) with different initial peak 
atomic densities ny between 0.056(5) pm and 0.172(2) pm >, where 
the numbers in parentheses refer to the standard error of the mean 
taken over several measurements. The Rydberg excitation laser is then 
suddenly switched on with 0/21 =190 kHz and 4/27=30 MHz fromthe 
39p3, State. After an adjustable time ¢ we turn off the excitation laser 
and then take an absorption image to determine 7. Figure 2 shows that 
the time evolution of n,is strikingly nonlinear and exhibits two distinct 
types of behaviour, depending on no. For high n, there is a short initial 
plateau inn,followed by rapid exponential decay, reflecting the initial 


growth of the excitation density. This decay stops at a fixed density 
n;= 0.075 jum that is constant over a wide range of initial densities 
(standard deviation 0.003 jum’), indicating a stable attractor for the 
many-body dynamics. By contrast, for ng <n; the dynamics appears 
mostly frozen, characteristic of an absorbing phase. These two types 
of behaviour and the sudden transition between them signal the under- 
lying absorbing-state phase transition that depends upon the initial 
density and driving strength. On much longer timescales we observea 
slower overall decay, which we attribute to residual single-atom excita- 
tions (and subsequent loss) with a characteristic rate T/21 =1.12(2) Hz. 
Because of this slow loss, the self-organized state is not sustained 
indefinitely; however, the very large separation of timescales in our 
experiment makes it possible to robustly observe the emergent SOC 
features in the quasi-stationary regime (hereafter referred to as the 
stationary state). 

We now verify that the Langevin equation provides a good theoretical 
description for the experimental observations. Through comparison 
with the data we confirm the required coupling between the active 
density and the total remaining density, as well as the key hierarchy of 
scales: kn, => bf >t. For this it is sufficient to compare our data with 
a homogeneous mean-field approximation to the Langevin equation 
(D=Oand €,=0). We find that the mean-field solutions—shownas solid 
lines in Fig. 2—-describe the data well, except for the minor deviationin 
the approach to the stationary state seen around t=2 ms. By simultane- 
ously fitting all of the data shown with a single set of parameters, we 
find //2m = 11.7(9) kHz, «/2mt = 144(10) kHz pm?’ and b= 0.059(5), with 
the statistical errors estimated using bootstrap resampling. Thus the 
required separation of scales is satisfied by an order of magnitude 
or more, placing our experiments firmly in the regime in which SOC 
is expected. Furthermore, our experimental observations and their 
theoretical confirmation establish the presence of the anticipated 
absorbing-state phase transition and the self-organization to a station- 
ary state that is independent of initial conditions—that is, the system 
displays SOC signature (1). 


Scale-invariance of the stationary density 


We now turn our attention to experimental manifestations of the 
observed phase transition on the stationary state. In Fig. 3 we examine 
the dependence of the stationary density n; (reached after t=10 ms of 
evolution) onthe driving intensity Q?« x. For different initial densities 
np, the stationary state exhibits a clear density-dependent critical inten- 
sity Q2 that separates the absorbing phase (with n,~ nj) from the active 
self-organizing phase (with n;< nj). For the latter, the data fall ontoa 
single curve resembling a power law that is independent of the initial 
density (dotted blue line in Fig. 3a). Although mean-field theory (solid 
lines) reproduces the qualitative features, the experimental data exhibit 
important quantitative differences, including a shift in the threshold 
intensity and a markedly different power-law exponent. 

To further quantify the scale-invariant properties, we apply the scal- 
ing ansatz ny = NoF(Q7nY”’). By plotting n,/no asa function of ny? a7, 
all of the data collapse onto a single universal curve (Fig. 3b), with the 
best results obtained for f’ = 0.869(6). We find that the scaling function 
F(x) is well modelled by the heuristic function F(x) = x8 (x + xBy 
(dashed blue curve in Fig. 3b), where x, and v are free parameters 
describing the position and sharpness of the transition region between 
absorbing and active phases. For x > x, the scaling ansatz is a power 
lawn, «Qf °/*, and therefore we can identify Bas the scaling expo- 
nent that characterizes the stationary density and show that 1-6/p’ 
quantifies how (in)sensitive n; is to the initial density. Fitting the 
rescaled data ona log-log scale we obtain B= 0.910(4), v=10.6(8) and 
X, = 641(3) kHz? ym?” The errors in parentheses are the standard 
deviation of the fitted parameters obtained via bootstrap resampling. 
Agreement with the scaling ansatz is confirmed by the small and evenly 
scattered normalized residuals between the rescaled data and the 
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Fig. 3 | Scale invariance of the self-organized stationary state as a function of 
the driving intensity Q?. a, Stationary-state density n, measured at t=10 ms as 
a function of 0? and for different initial densities ny using the same parameters 
asin Fig. 2, except with A/27=18 MHz. For large Q? and nj, all points collapse 
onto one single power-law curven,« 2 (dotted blue line). b, The same data 
with rescaled axes to achieve full data collapse, revealing a unique scaling 
function (with fit shown by the dashed blue line) for the stationary density n,. 
The inset shows the normalized residuals between the rescaled data and the 
fitted scaling function. The solid lines ina, b correspond to mean-field 
solutions of the effective Langevin equation. Each data pointis the average 

of five measurements. 


fitted scaling function, spanning both the absorbing and active phases 
(Fig. 3b, inset). The clear power-law dependence additionally rules out 
substantial modifications owing to the finite system size or inhomo- 
geneous trapping geometry. Additional data taken for different densi- 
ties and detunings of the driving field and slightly different 
experimental conditions exhibit a very similar scaling form and con- 
firms this measured scaling exponent within an accuracy of a few 
per cent (Extended Data Fig. 1). By contrast, the mean-field scaling 
solution predicts B’yp = B,,, = 1 whichis clearly incompatible with our 
data. Although it is still debated to what extent SOC systems exhibit 
universal behaviour'®””) it is striking that a single function describes 
the stationary state over the entire accessible parameter regime, and 
that this function acquires a scale-invariant form characterized by a 
non-trivial scaling exponent—that is, SOC signature (2). 


Power-law-distributed excitation avalanches 


We now show that the SOC state is also evident in the statistical fluc- 
tuations of the active component. For this we use a different detec- 
tion method, which is based on field ionization of the Rydberg excited 
atoms. For the following measurements we use the 66p;,, state for 
detection purposes, but otherwise the experimental conditions are 
comparable. The measurement is destructive, so each measurement 
point corresponds to a new experimental realization. 
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Figure 4a, b shows a time trace of the temporal evolution of the 
remaining density and the instantaneous number of excitations (the 
active component). The remaining density follows the same charac- 
teristic non-exponential time dependence as seen in Fig. 2, except for 
overall slower dynamics that can be explained by the longer lifetime 
and the larger C, coefficient for the 66p;,, state, which lowers b in the 
effective description (see Supplementary Information). Figure 4b 
shows that the active component undergoes rapid growth at early 
times, which saturates the detector around 2-5 ms, and then reduces 
again as aconsequence of the associated fast atom loss. After 10 ms the 
remaining density has almost reached the stationary value; however, 
we observe very large fluctuations of the excitation number, ranging 
from almost zero to clusters of up to approximately 800 excitations. 
We interpret this as the strong response of the system to individual 
excitation events that trigger avalanches that have a broad distribution 
of sizes and durations, whichis expected as the dynamical gap vanishes 
close to the critical point—that is, the behaviour is evidence of SOC 
signature (3). In Extended Data Fig. 2 we present additional evidence 
of this strong response in the bulk observables following a parameter 
quench. The dashed lines in Fig. 4a, b show the mean-field solution to 
the effective Langevin equation, which describes the remaining atom 
number well, but as expected it completely fails to capture the large 
fluctuations. Additionally we observe avalanches over a wide time 
window (up to 40 ms) even though the remaining density appears 
mostly constant. This shows that the system remains close to the SOC 
state for an extended time period, despite the absence of an obvious 
particle reloading mechanism, which would be required to keep the 
system indefinitely at the critical point. 

To investigate the distribution of the avalanche sizes s, we chose a 
fixed time of 25 ms and repeated the experiment 3,630 times. At this 
fixed time the observed excitation spikes are relatively sparse (ena- 
bling their interpretation as individual avalanche events), yet frequent 
enough to obtain sufficient statistics. Figure 4c shows the correspond- 
ing empirical probability-distribution function obtained by binning the 
data using logarithmically spaced intervals and plotted on a double 
logarithmic scale. The empirical probability distribution function is 
well described by a power law that spans 1.5 orders of magnitude and 
has anupper cutoff determined by the finite system size or the detector 
saturation (both effects are expected to play arole around s 2500). The 
plateau around s< 20 is attributed to the noise floor of the detector. To 
confirm that the observed power-law distribution is indeed a feature 
of the self-organizing dynamics, we also show in Fig. 4c acomparable 
distribution obtained by a resonant excitation pulse of 1 ps duration, 
which yields a stretched Poissonian distribution, as expected for mostly 
uncorrelated excitations. To estimate the power-law exponent, we 
truncate the empirical data in the window 20 <s< 400 (corresponding 
to 2,450 measurements), and apply amaximum-likelihood estimation, 
yielding a power-law exponent of a =~-1.37(2), where the statistical 
uncertainty was estimated using bootstrap resampling. The power- 
law exponent falls in a similar range to observations made in other 
conjectured SOC-like systems, suchas forest fires*, neuronal networks®, 
earthquakes and solar flares®. However, it is important to note that 
non-universal corrections (owing to, for example, the non-vanishing 
dissipation and driving rates or imperfect separation of scales) could 
still affect the apparent critical properties”°. Ultracold atoms offer the 
prospect of controlling these experimental conditions (for example, 
through larger detunings corresponding to lower seed excitation rates) 
and of determining the critical exponents for different dimensionali- 
ties in a single experimental system, permitting more stringent tests 
of universal scaling predictions. 

The demonstrated versatility of ultracold Rydberg gases combined 
with the ability to understand and experimentally control the micro- 
scopic physics in this system makes it a unique platform for studying 
non-equilibrium collective behaviour. Future experiments could imple- 
ment a mechanism to slowly add particles to the system (that is, an 
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Fig. 4 | Observation of power-law distributed excitation avalanches. 

a, Evolution of the remaining density for the 66p;,, state (solid line). 

b, Simultaneously measured Rydberg atom number (active component) 
integrated over the whole atom cloud showing large fluctuations of the active 
density for 210 ms (solid line). Each of the 200 plotted values corresponds toa 
new realization of the experiment. The dashed lines ina, bare mean-field 
predictions, where the effective volume of the atom cloud in bis adjusted for 
optimal agreement. c, Probability distribution for the instantaneous number 


additional regrowth term in equation (3)) to sustain the SOC state on 
even longer timescales”. It should also be possible to investigate other 
observables beyond the mean-field level, including spatio-temporal 
correlations in the active and remaining densities. This would make 
it possible to determine multiple critical exponents and scaling rela- 
tions, helping to answer long-standing questions about the universal or 
non-universal aspects of SOC andits relation to other non-equilibrium 
universality classes. Additionally, further experiments could explore 


the interface between driven-dissipative and isolated quantum systems 


governed by competing classical and quantum dynamical rules**°, 


ultimately leading to amore complete and quantitative understanding 
of non-equilibrium universality. 
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Methods 


Sample preparation 

Our experiments are performed using a thermal gas of potassium-39 
atoms, loaded directly from a magneto-optical trap into a crossed 
optical dipole trap. The resulting cigar shaped atom cloud has a temper- 
ature of 40 pK ande “” radii of 10 pm x 100 pm. This should be compared 
to the characteristic distance between facilitated Rydberg excitations 
Teac = (Co/A)”®, which for a detuning of A/27 = 30 MHz is about 1.7 pm. 
The peak number of atoms in the |g) state is 1.3 x 10°, and the density 
determined by in situ imaging is 2.4 x10" cm’. To vary the density while 
holding all other parameters fixed, we reduce the magneto-optical trap 
loading time. The lifetime of the atoms in the trap without Rydberg 
excitation is about 4 s—that is, much longer than the relevant timescales 
for the SOC dynamics. 


Excitation laser 

To excite the atoms to the 39p;,, Rydberg state we use a single photon 
optical transition at a wavelength of 285 nm. This light is produced by a 
frequency-doubled dye laser delivering up to 200 mW of single-mode 
light and is frequency stabilized to a high-finesse cavity, resulting in an 
independently measured linewidth of 400 kHz. The excitation beam is 
aligned parallel to the long axis of the trap and weakly focused to a waist 
much larger than the size of the atom cloud such that it is practically 
uniform. We experimentally determine the Rabi frequency Q for every 
individual repetition of the experiment by logging the respective single- 
shot-laser power ona photodiode and employing an independent Rabi 
frequency calibration based on measuring the light shifts induced by 
the laser via Ramsey interferometry”. 


Numerical simulation of the Langevin equation 

Although the Langevin equation (equations (2) and (3)) is straightfor- 
ward to solve inthe mean-field approximation, in Fig. 1 we show exem- 
plary numerical simulations that capture the effects of diffusion and 
multiplicative noise terms in a one-dimensional setting. For these 
simulations we make use of the XMDS2 (stochastic) differential-equa- 
tion solver package”, assuming a transverse grid size of 128 points 
and a timestep of 2.5 x 10°. The noise term is implemented as a zero- 
mean Wiener process with a standard deviation proportional to /p. 
However, to ensure numerical stability we found it necessary to impose 
anoise cutoff by setting &= 0 when p< 0.0025n. For b=Othe solutions 
exhibit an absorbing-state phase transition at n, = 0.39 and power-law 
scaling consistent with directed percolation universality (in one-dimen- 
sion, Bpp = 0.276). For b # O we find that the individual timetraces 
obtained from the full numerical solution are qualitatively very similar 
to the corresponding mean-field solutions. By fitting the numerical 
results in the same manner as performed for the experimental data, 
we obtain slightly larger effective parameters x and /. 


Comparison of the power-law hypothesis to alternative 
distributions 

To test whether the avalanche data are indeed described by a power-law 
distribution we employ the widely used Kolmogorov-Smirnov (KS) test 
against several alternative distributions, including other heavy-tailed 
distributions (following the definitions in ref. °). The KS statistic is 
defined as the maximum distance between the cumulative distribu- 
tion of the empirical data and that of the hypothesized distribution, 
with small values much less than 1indicating good agreement. In all 
cases we minimize the KS statistic as a function of the parameters of 
the hypothesized distribution, restricting the data and the hypoth- 
esized distributions to the range 20 < s < 400. For the data depicted 
in Fig. 4, the obtained KS-test statistics are: 0.015 (power law), 0.102 
(exponential), 0.031 (log-normal), and 0.04 (gamma). This shows that 
the power-law distribution provides a better fit to the data than the 
alternative distributions. The power-law exponent a =~—1.38 found via 


KS minimization is in excellent agreement with the value obtained via 
the maximum-likelihood estimation®. 


Detuning dependence and further evidence for non-equilibrium 
universality 

Inthe following we present additional evidence for the universal nature 
of the self-organized stationary state. For this we performed additional 
measurements of the stationary density as a function of the driving 
intensity but for different detunings of the excitation laser, as shownin 
Extended Data Fig. 1. Each dataset shows qualitatively similar behaviour 
to that presented in Fig. 3, clearly showing the transition from an absorb- 
ing phase to a self-organizing active phase. However, these data also show 
that the location of the critical point depends on the laser detuning. 

To further analyse these data we apply the scaling ansatz 
Ne = NoF(Q7A4 n+), where B’ = 0.869, and we have included as a new 
parameter the detuning rescaling exponent, d. For d=—2.06(1) the data 
again collapse onto a single universal curve. In this way we determine 
thex« Q?/A? dependence of the spreading parameter, used elsewhere 
in the paper to compare the data with mean-field theory. 

Before analysing the scaling properties of the rescaled data, care- 
ful inspection shows that it has a slightly different form to the scaling 
function F(x) used to describe the data in Fig. 3b. This is evidenced by 
the fit to F(x), shown asa blue dashed line in Extended Data Fig. 1b. The 
deviation is most apparent in the normalized fit residuals (Extended 
Data Fig. 1, inset) which, in contrast to Fig. 3b, exhibits some structure 
(for example, the inverted U-shape of the black datapoints). Unless 
properly accounted for, this deviation between the scaling form of 
the data and the heuristic scaling function causes a systematic error in 
the determination of the critical scaling exponent. To rectify this, we 
model the detuning-dependent data by a generalized scaling function 
F(x) = [1+ (x/x,)"" + (x/x,)7 1”, where the newly introduced parameters 
X,<x,and a< B empirically describe power-law scaling for intermediate 
driving intensities. Inthe asymptotic regimex>x,, the scaling function 
once again reduces to a power lawn)/ny «x ¥. 


Critical response 

As additional evidence for the system reaching a critical state, we 
have investigated the gapless response of the stationary state follow- 
ing a parameter quench. Assuming the SOC state is indeed an attrac- 
tor for the dynamics, on one hand we expect that small perturbations 
(for example a sudden change of the spreading parameter xk), 
should trigger avalanche-like processes that eventually bring the 
system back to a new critical state corresponding to a lower station- 
ary density. On the other hand, ifthe system evolves to a state that is 
deep within the absorbing phase, then avalanches can only be trig- 
gered by perturbations larger than athreshold value corresponding 
to anon-zero dynamical gap. To measure this response we start from 
the stationary state (reached after t=10 ms) corresponding to differ- 
ent driving intensities 0? (sketched in Extended Data Fig. 2a). We then 
perturb the system by quenching the driving intensity to a new 
value Q2 and then wait for a further 10 ms before measuring the 
new stationary density. The whole procedure is then repeated 
witha slightly larger final driving intensity 02, ~ 02 + (2m 5OkHz)?. 
From these two measurements we estimate the susceptibility 
x= dig /dOF = [n¢(0A) - ng(0},)1/ (0% - 03). 

Extended Data Fig. 2 shows the measured susceptibility as a function 
of 6 = (0? - 0?)/02for three different initial conditions corresponding 
to.Q;<Q, (absorbing), O,~Q, (critical) and Q,>Q, (active). For each of 
these initial conditions we observe pronounced minima in x corre- 
sponding to the strongest system response. We interpret the leading 
edge onthe left side of each minimum as the point where the perturba- 
tionis sufficient to bring the system back to the active phase, thereby 
triggering avalanche-like dynamics and extra loss. When starting deep 
inthe absorbing phase (black circles) the onset occurs at a large value 
of 6, which is a measure of the non-zero dynamical gap. By contrast, 
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the onsets for critical (brown triangles) and active (red squares) initial 
states both coincide at 6 = O within the experimental resolution. We 
can compare these data to a prediction of the susceptibility obtained 
from a derivative of the experimentally determined scaling function 
using B=0.910. The scaling-function predictions (solid lines in Extended 
Data Fig. 2) are in good agreement with the data, whereas mean-field 
predictions (dotted lines) systematically fail to capture the widths and 
heights of the peaks. However, we note that starting from the initial 
active state, the measured response is narrower and slightly stronger 
than the scaling-function prediction. This is further evidence that the 
system evolves towards a state that is sharply concentrated at the crit- 
ical point, instead of one that is a statistical mixture of many different 
accessible macro states. From these experiments we confirm that when 
starting from a supercritical state (irrespective of the precise value of 
0;>Q.), the system self-organizes to a critical state that is characterized 
by a vanishing excitation gap—underpinning SOC signature (3). 

Fitting the generalized scaling function to the rescaled data yields 
B=0.95(3), where the larger statistical uncertainty reflects the fact that 
the generalized model function has more parameters. This is close to 
the value B= 0.910(4) determined from the density-dependent datain 
the main text. Refitting the density-dependent data with the general- 
ized scaling function yields 6 = 0.920(7). This shows that although the 
full form of the scaling function is not universal, datataken under very 
different conditions concerning initial densities and detuning of the 
driving field do in fact share acommon universal critical exponent 
describing the asymptotic scaling regime. 


Role of trap inhomogeneities and residual coherence 

We can also rule out possible modifications to the scaling behaviour 
due to other experimental details such as the inhomogeneous density 
or residual effects of quantum coherence. 


Inhomogeneities. In the experiment the atoms are laser-trappedina 
cylindrical geometry of finite diameter and length, causing a nearly 
homogeneous density distribution in the trap centre and smooth 
variation of n, at the boundaries. n, smoothly follows the Gaussian 
trapping profile of the lasers. To estimate the impact of inhomogenei- 
ties, on this basis we now study a local density approximation for the 
Langevin equation. In this approximation, p(r, t) experiences aconstant 
background density n(r, f) = fi(r, t)/(r) at each point in space r, which 
is modulated by the trapping profile /(r), whereas /7i(r, f) only incorpo- 
rates fluctuations owing to the coupling to p,. An appropriate mean- 
field theory considers p, = vi p(r, t) as the spatially averaged den- 
sity over the system volume V, and fi,_, = Ng in the absence of fluctua- 
tions. The corresponding spatially averaged SOC line is located at 
lop —K.= (/no) Jy [1/I(r)] « ng! , demonstrating that the mean-field 
exponent £ = 1is not modified by the inhomogeneous geometry. 


Quantum coherence. The evolution of the density averages 
(Supplementary Equations (S4) and (S5)) is real and linear in time, 


which maps the final Langevin equation to a stochastic differential 
equation for classical processes. It incorporates strong, classical cor- 
relations between different atoms but lacks the possibility for long 
range coherence. Coherence between different atoms can be system- 
atically built-in by replacing the adiabatic elimination (Supplemen- 
tary Equation (S1)) with the exact solution, which amounts to a shift 
[>I+0,(Supplementary Equation (S5)). To leading order it intro- 
duces a coherent contribution, (+ Vy.) Ozmp to the right-hand side 
of Supplementary Equation (S5), where m,is the probability for having 
an excited particle within the coarse grained region / corresponding 
to the characteristic facilitation volume. Analogously to a damped 
harmonic oscillator, this evolution is observable on timescales 
t('+ yg-) <1, but washed out on larger timescales, that is, on the re- 
laxation towards the SOC steady state. Fast coherent processes might 
modify the parameters x, D and t, but not the structure of the Langevin 
equation. 


Data availability 


The data that support the findings of this study are available from the 
corresponding author upon reasonable request. 
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Extended Data Fig. 1| Further evidence for non-equilibrium universality. data and the fitted scaling function. The dashed blue line corresponds tothe 
a, Stationary-state density n, measured at t=10 ms asa function of 0? and for simple scaling function used in the main text, and the solid orange lineisa 
different detunings A. b, The same data with rescaled axes to achieve full data generalized scaling function that reproduces the asymptotic scaling form 


collapse, revealing the scaling function (with fit shown by the dashed blue line) more accurately. Each data point corresponds toa single measurement. 
for the stationary density n,. Inset, normalized residuals between the rescaled 
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Extended Data Fig. 2| Response of the SOC state to external perturbations. correspond to predictions based onthe experimentally determined scaling 

a, Sketch of the experimental procedure used to measure the susceptibility function, and the dotted lines correspond to mean-field predictions. Each data 
X= dn, /dQ} by quenching the spreading parameter x « Q? across the absorbing point corresponds to the average of eight measurements. For reference we 
state phase transition. b, Experimental data corresponding to three different show two representative error bars, corresponding to the standard error of the 
initial conditions corresponding to the absorbing phase (black circles), critical mean. 


phase (brown triangles) and active phase (red squares). The solid lines 
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Universal quantum information processing requires the execution of single-qubit and 
two-qubit logic. Across all qubit realizations’, spin qubits in quantum dots have great 


promise to become the central building block for quantum computation’. Excellent 
quantum dot control can be achieved in gallium arsenide*>, and high-fidelity qubit 
rotations and two-qubit logic have been demonstrated in silicon® ’, but universal 
quantum logic implemented with local control has yet to be demonstrated. Here we 
make this step by combining all of these desirable aspects using hole quantum dots in 
germanium. Good control over tunnel coupling and detuning is obtained by 
exploiting quantum wells with very low disorder, enabling operation at the charge 
symmetry point for increased qubit performance. Spin-orbit coupling obviates the 
need for microscopic elements close to each qubit and enables rapid qubit control 
with driving frequencies exceeding 100 MHz. We demonstrate a fast universal 
quantum gate set composed of single-qubit gates with a fidelity of 99.3 per cent anda 
gate time of 20 nanoseconds, and two-qubit logic operations executed within 

75 nanoseconds. Planar germanium has thus matured within a year froma material 
that can host quantum dots toa platform enabling two-qubit logic, positioning itself 
as an excellent material for use in quantum information applications. 


Gate-defined quantum dots were recognized early on as a promising 
platform for quantum information? and many materials have been 
investigated as hosts for the quantum dots. Initial research mainly 
focused on the low-disorder semiconductor gallium arsenide”. Steady 
progress inthe control and understanding of this system culminated in 
the initial demonstration and optimization of spin qubit operations” 
and the realization of rudimentary analogue quantum simulations’. 
However, the omnipresent hyperfine interactions in group III-V mate- 
rials seriously deteriorate the spin coherence. Considerable improve- 
ments to the coherence times could be achieved by switching to the 
group IV semiconductor silicon, in particular when defining spin qubits 
in an isotopically purified host crystal with vanishing concentrations 
of non-zero nuclear spins’. This enabled single-qubit rotations with 
fidelities beyond 99.9%’ and the execution of two-qubit logic gates 
with fidelities up to 98%°*”, underlining the potential of spin qubits 
for quantum computation. Nevertheless, quantum dots in silicon are 
often formed at unintended locations, and control over the tunnel 
coupling determining the strength of two-qubit interactions is limited. 
Moreover, the absence of a sizable spin-orbit coupling for electrons 
in silicon requires the inclusion of microscopic components such as 
on-chip striplines or nanomagnets close to each qubit, which compli- 
cates the design of large and dense two-dimensional (2D) structures. 
Scalability thus remains a challenge for these systems, anda platform 
that can overcome these limitations would be highly desirable. 

Hole states in semiconductors” typically exhibit strong spin- 
orbit coupling (SOC), which has enabled the demonstration of fast 
single-qubit rotations® ”. Furthermore, whereas valley degeneracy 


complicates qubit definition for electrons in silicon, this is absent for 
holes, and excited states can be well separated in energy. In silicon, 
unfavourable band alignment prevents strain engineering of low- 
disorder quantum wells for holes, restricting experiments to metal- 
oxide-semiconductor structures’. Research on germanium has mostly 
focused on self-assembled nanowires” and has demonstrated single- 
shot spin readout” and coherent spin control”. However, strained 
germaniumcan reach hole mobilities” of 1 >10°cm?V"s+, and undoped 
germanium quantum wells were recently shown to support the forma- 
tion of gate-controlled hole quantum dots”. Now, the crucial chal- 
lenge is the demonstration of coherent control in this platform andthe 
implementation of qubit-qubit gates for scalable quantum information 
with holes. 

Here we make this step and demonstrate single- and two-qubit 
logic with holes in planar germanium. We fabricate devices on sili- 
con substrates, using standard manufacturing materials. We grow 
undoped strained germanium quantum wells, measured to have high 
hole mobilities > 5 x 10° cm? V's‘ and a low effective hole mass”? 
m, =0.09m,, extrapolated to reach m, =0.05m, at zero density, with 
m, the electron rest mass. This allows us to define quantum dots of com- 
paratively large size, and we find excellent control over the exchange 
interaction between the two dots. We operate in a multi-hole mode, 
reducing challenges in tuning and characterization, whichis advanta- 
geous for scaling. We make use of the spin-orbit interaction for qubit 
driving and perform single-qubit rotations at frequencies exceed- 
ing 100 MHz. This advantage of fast driving becomes further appar- 
ent in coherently accessing the Hilbert space of a two-qubit system. 
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Fig. 1| Fabrication and operation of a planar germanium double quantum 
dot. a, False-coloured scanning electron microscope image of the two-qubit 
device, where Ohmic contacts are indicated in yellow, the barrier gate layer is 
depicted in green and the plunger gate layer in purple. Two hole quantum dots, 
indicated by the blue and red arrows, are formed ina high-mobility Ge quantum 
well and controlled by the electric gates. The direction of the external field By is 
indicated by the black arrow. b, Schematic cross-section of the system, where 
quantum dots are formed below plunger gates P1 and P2, while the different 
tunnelling rates can be controlled by barrier gates BS, BD and BC. c, Transport 
current through the double dot asa function of plunger gate voltages for weak 
(top) and strong (bottom) interdot coupling, mediated by a virtual tunnel gate. 
d, Charge stability diagram of the qubit operation point, where the dashed lines 


For example, in silicon the execution of acontrolled NOT (CNOT) gate 
implemented with an on-chip stripline has been shown using microsec- 
ond long pulses®*, and this timescale can be reduced to 0.2-0.5 ps by 
incorporating nanomagnets’. Here we demonstrate that the spin-orbit 
coupling of holes in germanium together with the sizable exchange 
interaction enables a CNOT within 75 ns. 

Ascanning electron microscope image of the germanium two-qubit 
device is shown in Fig. 1a. To accumulate holes and define two quantum 
dots, the circular plunger gates are set to negative potential (Vp, 
Vp. = —2 V). The tunnel coupling between the dots ¢,, and the tunnel 
couplings to the source and drain reservoirs (¢,s, tp) are controlled by 
the barrier gates BC, BS and BD, respectively. Working ina virtual gate 
voltage space (Vyp1, Vipa, Vis, Vion and V4), we can independently tune 
these properties (see Supplementary Videos 1-3 online for video-mode 
operation). We measure the transport current through the double dot 
system (Fig. Ic, d), and for certain hole occupations (Extended Data 
Fig. 3) we observe a suppression of the transport current for a positive 
bias voltage V;, =1 mV, caused by Pauli spin blockade (PSB) (see Fig. le). 
We make use of the blockadeas an effective method for spin-to-charge 
conversion””, as well as to initialize our two-qubit system in the blocked 
|’) ground state. 

Taking advantage of the strong spin-orbit coupling”, we are able to 
implement a fast manipulation of the qubit states by electric dipole 
spin resonance (EDSR). We tune the device to a readout point within 
the PSB region (indicated by the label Rin Fig. 1d) and apply an electric 
microwave excitation to gate P1. When the frequency of the microwave 
excitation matches the spin resonance frequency of either qubit, PSB 
is lifted and an increase in the transport current can be observed. We 
extract the resonance frequency of each qubit as a function of external 
magnetic field strength B, (Extended Data Fig. 4) and observe two 
distinct qubit resonance lines with g-factors g, = 0.35 and g, = 0.38 
(Fig. 1g). The difference ing-factors between the two dotsis likely to be 
caused by slightly different hole fillings and thus quantum dot orbitals. 
Asaneffect of the spin-orbit coupling, a strong orbital dependence of 
the effective g-factor is typically measured in hole quantum dots’**. 
Furthermore, the effective g-factor can be tuned electrically as a direct 
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correspond to the charge transitions. The detuning axis is indicated by the 
dotted line, with label R corresponding to the qubit readout point. To allow 
coherent control of the isolated spin states, a two-level voltage pulse on gates 
Pland P2 is used to detune the dot potentials and prevent tunnelling to and 
from the dots during the manipulation phase (label M).e, Transport current 
through the double dot asa function of plunger gate voltage for positive (left) 
and negative (right) bias. Pauli spin blockade becomes apparent from the 
suppression of the transport current for the positive bias direction, up tothe 
singlet-triplet energy splitting of £,;=0.6 meV. f, Illustration of the energy 
landscape in our double-quantum dot system. g, Resonance frequency, /f,..., of 
the two qubits as a function of the external magnetic field, showing the 
individual qubit resonances. 


result of the SOC” (see for example Fig. 3c, d), thereby guaranteeing 
independent control of the different qubits. We observe that the reso- 
nance frequency of both qubits remains stable over several hours, with 
discrete jumps at longer timescales as presented in Extended Data Fig. 5. 

We developed a measurement technique in which we measure the 
averaged transport current over Nrepeated pulse cycles and subtract 
areference measurement using a lock-in amplifier, to mitigate slow 
variations in the transport current (see Methods), as is indicated 
in Fig. 2a. After readout, the system is left in the blocking |Vv) 
state, serving as the initialization of our qubits. We now operate the 
device in the single-qubit transport mode in an external field of 
By=0.5T and use the second qubit (Q2) as a readout ancilla. Coherent 
control over the qubit is demonstrated in a Rabi experiment, where 
the spin state of qubit 1 (Q1) is measured as a function of microwave 
pulse length ¢, and power P, as shownin Fig. 2b. By increasing the power 
of the microwave pulse, we can reach Rabi frequencies of over 100 MHz, 
at an elevated field B, =1.65 T (Extended Data Fig. 6). 

To determine the control fidelity, which describes the accuracy of 
our quantum gates, we implement randomized benchmarking of the 
single-qubit Clifford group” (Fig. 2c). The measured decay curve of the 
qubit state as a function of sequence length mis shown in Fig. 2d, from 
which we extract a single-qubit control fidelity of F,=99.3%, using gate 
times ¢,=20 ns andt,,.=10 ns. In Fig. 2e, we show the gate fidelities for 
the different m and 1/2 gates as obtained by interleaved randomized 
benchmarking, where each randomly drawn gate is followed by the 
respective interleaved gate (see Fig. 2c). All individual gate fidelities 
areF,>99%, with the infidelity for 1/2 gates being approximately twice 
as low as for the 1 gates, on account of the difference in pulse length. 

We extensively characterize the coherence in our system 
at an exchange coupling of J/h = 20 MHz and find T3 g,= 833 ns 
and 7} 92 = 419 ns, which can be extended by performing a Hahn echo 
toT} a= 1.9 psandT} y= 0.8 pis (data in Extended Data Fig. 7), as indi- 
cated in Fig. 2f. These coherence times compare favourably to 

$=130ns for germanium hut wires” and 7$=270ns for holes 
in silicon’’. Electrons in GaAs have an even shorter dephasing time”, 
withT3=10 ns. The limited 73 in GaAs is due to hyperfine interactions, 
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Fig. 2| Qubit control, gate fidelity and quantum coherence of planar 
germanium qubits. a, Measurement sequence used for the Rabi driving 
measurements. Measurement cycles with EDSR pulses are alternated with 
reference cycles without a microwave tone, allowing an efficient background 
current subtraction. Each cycle is repeated Ntimes, such that measurement 
and reference cycles alternate at a typical lock-in frequency of fineas = 89.75 HZ. 
b, Colour map of the differential bias current A/,, as a function of microwave 
pulse time ¢, and power P, where clear Rabi rotations on Qlcan be 


which can be mitigated to a large extent by using nuclear notch filter- 
ing”’, leading to 7, = 800 ps. This source of dephasing can be avoided 
altogether by using group IV materials with nuclear spin-free isotopes””. 
This has led to 7, =28 ms for electrons in isotopically purified silicon”, 
and isotopic purification may also increase the quantum coherence in 
germanium. Furthermore, we observe spin lifetimes of 7,,.,;=9 ps and 
T,92 = 3 US. We have found that these lifetimes increase exponentially 
when lowering the tunnel coupling between each qubit and its respec- 
tive reservoir (Extended Data Fig. 8), and relaxation times of 7,>100 ps 
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Fig. 3 | Tunable exchange coupling and operation at the charge symmetry 
point.a, Illustration of the relevant energy levels in our hole double quantum 
dot with zero (green) and finite (black) exchange coupling/ between the dots. 
Six energy levels are considered: the four different (1,1)-charge states as well as 
the (2, 0), and (0, 2), singlet charge states in which both holes occupy the same 
quantum dot. Four individual transitions can be driven, corresponding to the 
conditional rotations of the two-qubit system. The size of the exchange 
interactionis equal to//h=f,—f,=f, —f;.b, Measurement pulse cycles used to 
map out the exchange splitting of Q1 (top) and Q2 (bottom). Asa result of the 
demodulation of the alternating cycles, transition /, 3 gives anegative signal 
and transition/, ,, results in a positive signal. c,d, EDSR spectra of QI (c) and 
Q2 (d) asa function of the detuning €. The exchange splitting canbe tunedtoa 
minimum at €=0 and increases closer to the (m, n)—(m +1, n-1) and (m, n)- 
(m-1,n+1) charge transitions. e, Exchange interactionas a function of eas 


observed. a.u., arbitrary units. c, Schematic illustration of the (interleaved) 
randomized benchmarking sequence applied to Q1. C corresponds toasingle 
Clifford gate, with m being the total number of applied random Clifford gates. 
d, Differential bias current as a function of m for the randomized benchmarking 
sequence on Q1. The extracted control fidelity is F. = (99.3 + 0.05)%.e, Gate 
fidelities for the m and 1/2 gates. Error bars correspond to lo. f, Spin coherence 
and life times for Qland Q2. Error bars correspond to lo. 


have been reported for germanium nanowires”””’, both giving good 
prospects for increasing the relaxation time by closing the reservoir 
barrier during operation. 

When the manipulation of both qubits is combined, the coupling of 
the two qubits (exchange interaction/) becomes apparent. As is illus- 
trated in Fig. 3a, the resonance frequency of each of the qubits is shifted 
when the other qubit is prepared in its |) state. The strength of this 
interaction depends on the inter-dot tunnel coupling ¢,, as well as the 
detuning ¢ of the dot potentials. By changing the amplitude of voltage 
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extracted fromc, d. Fitting the exchange coupling yields an interdot tunnel 
coupling ¢t,,/h=1.8 GHz and charging energy U=1.46 meV. f, The interdot tunnel 
coupling can also be controlled by gate BC. Changing the potential on this gate, 
while keeping ¢=0, allows good control over the exchange interaction between 
the two qubits. g, Coherence time7} of both qubits as a function of detuning 
voltage V,. When the slope of the resonance line is equal to zero, the qubit is 
expected to be, to first order, insensitive to charge noise. Solid lines indicate 


fits of the datato ates 4 To " with Ss the numerical derivative ofthe 
resonance line frequency asafunction Of detuning, T, the residual 
decoherence and aascaling factor. It can be observed that 7} is indeed longest 
when the slope of the resonance line is closest to zero. Error bars correspond to 
lo. h, Resonance frequency of transition, and f, as a function of detuning 


voltage. 
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Fig. 4 | Fast two-qubit logic with germanium qubits. a, EDSR spectra of both 
qubits. Resonance peaks can be observed, corresponding to the four individual 
transitions indicated in Fig. 3a. The peaks are power-broadened, and the 
linewidth is thus determined by the Rabi frequency. b, Controlled qubit 
rotations can naturally be performed by selectively driving each of the four 
transitions. A CX gate is achieved at t,, =¢, onf, (f;). Asmall off-resonant 


pulse to point M (dotted line in Fig. 1d), we can map/as a function of 
the detuning ¢. This is shown in Fig. 3c, d, where the subtraction of two 
pulse sequences in the measurement (see Fig. 3b) results in a positive 
signal for the unprepared qubit resonances and a negative signal for 
the prepared states (see Extended Data Fig. 2). As shown in Fig. 3e, the 
exchange coupling that is reflected in the frequency difference between 
the initial and prepared resonance positions, is very well described by 
asimple model? using J = 4Ut?,/[U?- (ae — U)”] Here, Uis the charg- 
ing energy of the quantum dots, a= 0.23 is the lever arm of Pl and P2, 
and the interdot tunnel coupling is ¢,,/h = 1.8 GHz. In addition, the 
strength of ¢,, can be tuned by using the central barrier BC (Fig. 3f). 
Here, we use a virtual gate voltage’ V,,,, where Vgc is set while 
compensating its influence on the dot potentials by appropriate 
corrections to V,, and V,,. As a result of this full control over the cou- 
pling, we are able to operate the qubits at a mostly charge-insensitive 
point of symmetric detuning*°, where the qubit resonance frequencies 
are the least susceptible to changes in the electric field, while choosing 
an exchange coupling strength large enough for rapid two-qubit oper- 
ations. The advantage of this reduced sensitivity to detuning noise is 
demonstrated in Fig. 3g, where the dephasing time T3 of both qubits 
is measured as a function of e. Here, 73 strongly increases where the 
slope of f, 3) with respect to the detuning for Q1 (Q2) is minimal, with 
the longest average phase coherence reached in the flat region at 
V.=6mV. 

The direct control over the tunnel coupling enables us to tune the 
exchange interaction to a sizable strength of //h = 39 MHz at the sym- 
metry point, as demonstrated in Fig. 4a. We exploit this to obtain fast 
selective driving and operate in an exchange always-on regime®”. Full 
controlis obtained by applying microwave pulses at the four resonant 
frequencies, while further gate pulses controlling / are not needed. 
Apulse ata single resonance frequency will result ina conditional rota- 
tion of the target qubit, as we show in Fig. 4b. A CX-operation can be 
achieved by setting ¢,, to give a maximum signal, corresponding to 
a conditional 1-rotation on the target qubit. The slight off-resonant 
driving that can be observed onf,is mitigated by choosing the driving 
speed such that tc = Cy resonant = Canoff-resonant- A fast CX-operation is thus 
achieved within f,x,9, = 55 ns and f¢x,9) = 75 ns, with Ql and Q2 as the 
target qubits respectively. 

As a result of the pulsing, we observe a minor shift in the resonance 
frequency of both qubits, observed before in Si/SiGe quantum dots”. 
We compensate the temporary change in resonance frequency by 
applying phase corrections to all following pulses (see Extended Data 
Fig. 9). In Fig. 4c, we show the effect of a controlled rotation on the 
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driving effect can be observed, which we mitigate by tuning tc, = ty resonant = 

tn offresonant: €, Colour plot of A/sp as a function of QI CX-pulse length, 6,, and the 
phase of the second mt/2-rotation on Q2, @;. Owing to the Z(0,/2) rotation 
onthe control qubit, amt phase shift can be observed on Q2 for a conditional 2m 
rotation on Q1(f;). 


control qubit with applied phase corrections. We observe a larger signal 
amplitude on Q1 after 0 and 41 rotations on Q2 as compared witha 
2m rotation on Q2. This 41 periodicity is in agreement with fermionic 
statistics and suggests an echoing pulse correcting residual environ- 
mental coupling. The full mt phase shift on Q2 for a conditional 2m rota- 
tion on QI, as aresult of the 6,/2 phase that is accumulated by the control 
qubit, demonstrates the application of a coherent CX gate. 

The demonstration of a universal gate set with all-electrical 
control and without the need of any microscopic structures offers 
good prospects to scale up spin qubits using holes in strained ger- 
manium. The hole states do not suffer from nearby valley states, 
and the quantum dots are contacted by superconductors” that may 
be shaped into microwave resonators for spin-photon coupling. 
This provides opportunities for a platform that can combine semicon- 
ducting, superconducting and topological systems for hybrid technol- 
ogy with fast and coherent control over individual hole spins. Moreover, 
the demonstrated quantum coherence and level of control make 
planar germanium a natural candidate to engineer artificial Hamil- 
tonians for quantum simulation, going beyond classically tractable 
experiments. 
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Methods 


Fabrication process 

Our Ge/SiGe heterostructures are grown ona 100-mm n-type Si(001) 
substrate, using an Epsilon 2000 (ASMI) RP-CVD reactor, as described 
in ref. ”. The device’s Ohmic contacts and the electrostatic gates are 
defined by electron beam lithography, electron beam evaporation 
and lift-off of Al and Ti/Pd. Ohmic contacts consist of a 20-nm-thick 
Al layer, followed by a 17-nm-thick AI,O, gate dielectric grown by 
atomic layer deposition at 300 °C. Next, the first layer of Ti/Pd (40 nm) 
gates is deposited, followed by 17 nm of Al,O, and the second layer of 
overlapping Ti/Pd (40 nm) gates. Finally, vias contacting the lower 
gate layer are etched through the top AI.O, layer, followed by the deposi- 
tion of 1-~m-thick Al,,Si, bond pads to protect the device during bonding. 


Experimental set-up 

All measurements are performed ina Bluefors dry dilution refrigera- 
tor with a base temperature of 7,,,, 10 mK. Constant d.c. voltages are 
applied with battery-powered voltage sources, and the voltages on 
gates Pl and P2 are combined with an a.c. voltage by a bias-tee witha 
cut-off frequency of 3 Hz. Thea.c. voltage for gate P1lis generated by an 
arbitrary waveform generator (AWG) Tektronix AWGS014C, combined 
with a microwave signal generated by a Keysight PSG8267D vector 
source. The a.c. voltage for gate P2 is solely the waveform generated 
by the AWG. EDSR pulses are generated by the PSG8267D using the 
internal IQ-mixer, driven by two output channels of the AWG. Both 
qubits can be addressed by setting the vector source to an intermedi- 
ate frequency of typically f,<, = 2.56 GHz, and IQ-mixing this witha 
(co)sine wave generated on channels 3 and 4 of the AWG. Because the 
on/off ratio of the 1Q-modulation of our vector source is only 40 dB 
and small residual output power may lead to added infidelity, we use 
digital pulse modulation in series with the IQ modulation. The pulse 
modulation is driven by the AWG and is turned on 15 ns before the first 
pulse and turned off7 ns after the last pulse in the sequence, resulting 
in a total suppression of 120 dB when the sourceis off. 

We typically apply a source-drain bias voltage of V;, = 0.3 mV and 
measure the current through the device using anin-house-built transim- 
pedance amplifier, after which the signal is low-pass filtered at 10 kHz 
and measured using an Stanford Research SR830 lock-in amplifier, as 
described in Methods section ‘Sequence details’ below. 


Virtual gates 

To allow independent control over the tunnel coupling and the charge 
occupation of the double dot system, we make use of virtual gates’. When 
changing the different barrier gate voltages, linear corrections are applied 
to the device's plunger gates to correct for the cross-capacitance between 
the different gates. These coefficients are obtained from the relative 
slopes of the charge-addition lines with respect to the different device 
gates and normalized tothe respective plunger gate coefficient. We write 


Pl 


p2,p1 %pc,p1 %pr1,Pi me P2 


fe a be 
muee a, a, a a, 
vp2 p1p2 %p2,p2 Apc,p2 Aprip2 %pr2,P2 BRI 


BR2 


with VP1and VP2 the virtual plunger gates, and P1, P2, BC, BRl1and BR2 
the different physical device gates as indicated in Fig. la. The virtual 
gate matrix describes the different couplings and is given by 


1 p1 Ap2,p1 Oprip1 %BR2,P1 -(t 0 0.8 0.35 0 
&p1p2 Ap2,p2 %pc,p2 Aprip2 %pr2,P2) \O 1 0.8 O 0.4 


We do not correct for the crosstalk between the two plunger gates, 
such that >) p; = @p,,p2 = 0. The crosstalk between the quantum dot and 
the reservoir barrier of the other dot is negligible because of their 
physical separation. Furthermore, it can be observed that the coupling 


A5c,P1 


of the centre barrier to both dots is approximately twice as strong as 
the reservoir barriers as a direct effect of its increased size. 


Sequence details 

To improve the quality of the transport measurements, we establisha 
lock-in measurement scheme in which the measurement of interest is 
alternated with a reference measurement to account for slow variations 
in the transport current through the device, as well as temperature- 
dependent drifts in our transimpedance amplifier, as is illustrated in 
Extended Data Fig. 1. The measurement cycle, consisting of the readout 
as well as the manipulation phase, typically has a length of T),-. ~1 Us. 
Withthe AWG, we generate a waveform that repeats the measurement 
cycle Ntimes, followed by Nrepetitions of a similar reference measure- 
ment, with N chosen such that these cycles alternate at a lock-in fre- 
quency Of fiockin = 89.75 Hz. The measured transport current is then 
demodulated by alock-in amplifier, using a reference signal generated 
by the AWG. As a result, the lock-in output signal will be directly related 
to the difference in transport current between the measurement and 
the reference cycle. During the readout, no differential current 
is observed when the qubits are in their |¥\) ground state, 
whereas a signal of typically A/,, ~ 0.3 pA is measured for all other spin 
configurations and a total cycle length of t,,.;.= 900 ns. This isin good 
agreement with a bias current A/=2e/t,,,;.= 0.4 pA, as expected for the 
random loading of a hole spin. 

For aRabi experiment, the measurement cycle contains a single micro- 
wave pulse of duration ¢,, whereas the reference cycle has no pulses. In 
the case of aRamsey experiment, both the measurement and reference 
cycle contain a tt/2 pulse, a wait tT and a final 1/2 pulse, but in the refer- 
ence cycle the final 1/2 pulse is phase-shifted by @ = t. This will result 
inan opposite projection for the two measurements and thereby maxi- 
mum differential signal. For the randomized benchmarking, a similar 
scheme is used (see Fig. 2a), where the recovery pulse in the measure- 
ment cycle is chosen to project to the spin-up state, while the recovery 
pulse inthe reference cycle projects to the spin-down state, resulting in 
an exponential decay towards A/,, = 0. Each data point is averaged over 
approximately 10° repetitions of 1,500 randomly drawn gate sequences. 
Finally, for the exchange measurements, we alternate a measurement 
cycle where we apply att and —Tt pulse to Q1 (Q2) before and after the 
probing pulse respectively, with a reference cycle where Q1 (Q2) is not 
pulsed. When the probing pulse is off-resonant with both resonance 
frequencies, the measurement cycle gives effectively no rotation of Q1 
(Q2) and the reference cycle does not result in any rotation. Asa result 
the demodulated signal will be zero. When the probing pulse frequency 
is on resonance with the unprepared resonance frequency f, (f,), the 
measurement cycle will still be an effective zero rotation on Q1 (Q2) due 
to the selective driving of f; (f) and thus give no signal. The reference 
cycle will now result in a 1 rotation on Q2 (Q1) and will therefore givea 
signal, resulting in anegative demodulated signal. Inthe case where the 
probing pulse is resonant with the prepared resonance line f, (6), the 
measurement cycle will generate a signal whereas the reference cycle 
will give no signal, thus resulting in a positive demodulated signal. All 
different pulse cycle configurations and the respective qubit projections 
are illustrated in Extended Data Fig. 2b. 


Phase corrections for pulsing 
We observe a shift of the resonance frequency of the qubits as a func- 
tion of the microwave driving power. We attribute this to a rectification 
of the microwave signal, resulting in a d.c. voltage pulse which can 
modulate the resonance frequency through the SOC and exchange 
interaction. As a result of the shift during the pulsing, each qubit picks 
up a phase when it is idling, as well as an additional phase due to the 
pulses on the other qubit. We can calibrate these frequency shifts and 
correct all following pulses to counteract this phase shift. 

To probe the effect of all possible pulses on all possible resonances, we 
perform an extended Ramsey experiment. We prepare a pulse sequence 


consisting of two 11/2 pulses with a test gate (each of the four resonance 
lines, as well as idling) and 1 phase-shifted test gate in between, as indi- 
cated in Extended Data Fig. 9. For the experiment onf, and/,, we add an 
additional preparation and projection pulse at the start and end respec- 
tively, as indicated in grey in Extended Data Fig. 9. The back-and-forth 
rotation on the test gate cancels any driving effects, as well as the 0/2 
phase picked up due to the conditional rotation, and leaves us with only 
the detuning phase. We now plot thetransport current A/,p as a function 
of the phase @ of the second tt/2-pulse, as well as the length of the test 
gate. As aresult of the frequency shift caused by the test gate, we observe 
a phase shift increasing linearly with the length of the test gate. We fit 
this phase shift for each gate, and we apply acorrection to all following 
gates. Extended Data Fig. 9 shows the phase evolution for all test gates 
onall four resonance lines, both without corrections (Extended Data 
Fig. 9a), as well as with corrections applied (Extended Data Fig. 9b). 


Data availability 


All data underlying this study are available from the 4TU ResearchData 
repository at https://doi.org/10.4121/uuid:95bclf2e-0218-4c55-8e5b- 
2b59e8fccSe6. 


34. He,L., Bester, G. & Zunger, A. Electronic phase diagrams of carriers in self-assembled 
quantum dots: violation of Hund’s rule and the Aufbau principle for holes. Phys. Rev. Lett. 
95, 246804 (2005). 

35. Reuter, D. et al. Coulomb-interaction-induced incomplete shell filling in the hole system 
of InAs quantum dots. Phys. Rev. Lett. 94, 026808 (2005). 

36. Hensen, B. et al. A silicon quantum-dot-coupled nuclear spin qubit. Nat. Nanotechnol. 
Preprint at http://arxiv.org/abs/1904.08260 (2019). 

37. Crippa, A. et al. Electrical spin driving by g-matrix modulation in spin-orbit qubits. Phys. 
Rev. Lett. 120, 137702 (2018). 


Acknowledgements We thank L. M. K. Vandersypen, S. Dobrovitski and J. Helsen for valuable 
discussions. We acknowledge support through a FOM Projectruimte of the Foundation for 
Fundamental Research on Matter (FOM), associated with the Netherlands Organisation for 
Scientific Research (NWO). 


Author contributions N.W.H. and D.P.F. performed the experiments. N.W.H. fabricated the 
device. A.S. and G.S. supplied the heterostructures. N.W.H., D.P.F. and M.V. wrote the 
manuscript with the input of all other authors. MV. conceived and supervised the project. 


Competing interests The authors declare no competing interests. 


Additional information 

Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019- 
1919-3. 

Correspondence and requests for materials should be addressed to MV. 

Reprints and permissions information is available at http://www.nature.com/reprints. 


Article 


1/f, = 11ms 


lock-in 


Extended Data Fig. 1| Instrumentation set-up for the lock-in transport demodulated ina lock-in amplifier to give a direct measure of the difference 
measurements. Illustration of the set-up and relevant signals for the lock-in between the two measurements and subtract slow variations in the transport 
transport measurements. The AWG is used to generate alternating pulse cycles signal. 


consisting of arepeated measurement and a repeated reference. The signal is 
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Extended Data Fig. 2| Pulse cycles used for the transport measurements. 
aPulse cycles used for the randomized benchmarking experiments. The 
measurement pulse cycle consists of m gates randomly drawn from the Clifford 
group C,,,4and a final Clifford gate projecting the qubit onto the spin-up state. 
The reference pulse cycle consists of the same m Clifford gates anda different 
final Clifford gate projecting the qubit onto the spin-down state. Each cycleis 
repeated Ntimes, and aseries of typically k=50 independent randomly drawn 
measurement and reference pulse cycles are alternated. These k=50 different 
draws are thus hardware-averaged on the lock-in amplifier, and the entire 
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experiment is repeated and averaged 30 times, yielding a total approximate 10° 
repetitions of 1,500 different randomly drawn Clifford sequences of length m. 
Anexample of the qubit evolution for each pulse cycleis plotted onthe Bloch 
sphere below. b, Pulse cycles used for the exchange mapping experiments. The 
measurement pulse cycle consists of abroad preparation and restoring pulse 
at frequency f; (f;), around a probing pulse at frequency f,,,. The reference 
pulse cycle consists solely of the probing pulse atf,,,. The qubit evolutions for 
the different resonance conditions are plotted on the Bloch sphere and 
illustrate the different signals measured in Fig. 4c, d. 
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Extended Data Fig. 3 | Demonstration of qubit operation at asecond hole 
occupancy. a, Charge stability diagram showing the (m, n) hole occupancy 
used during all experiments in the main text, as well as the (m,n+1) occupancy 
for which we observe PSB as well. For an unpolarized filling of the quantum 
dots, one expects an alternating suppression of the transport current dueto 
PSB, as spin blockade occurs only when an orbital level is fully occupied. 
However, the spin-filling for holes is knownto be highly polarized***, and 


ai Q2 
300 
8200/4 
<j 
100}/ 
lL 
0 100 200 0 100 200 


t (ns) t p (ns) 

therefore PSB can occur in sequential quantum dot fillings. b, Coherent Rabi 
oscillations measured in the (m,n+1) occupancy. Aslight linear offset is 
observed for Q1, which can be attributed to the microwave power. We note that, 
for the same microwave power, the Rabi frequency of Q2 inthe (m,n+1) 
occupancy is increased substantially compared to the (m, n) filling. We 
attribute this to the hole beingina different orbital, where the effective SOC 
may be different. 
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Extended Data Fig. 4 | Qubit resonance frequencies as a function of 
magnetic field. Colour plot indicating the transport current A/through the 
double dot system, as a function of external magnetic field By and the 
frequency fof the applied microwave signal. We have numerically subtracted 
the mean of each rowand column in each of the three individual colour plots, to 
account for the slow drifts in transport current, as well as the line resonances in 
our fridge cabling. The two bright lines indicate an increase in the transport 
current due to the microwave rotating either spin and thus lifting PSB. 
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Extended Data Fig. 5 | Temporal dependence of the resonance frequency. 
We track the resonance frequency of both Qland Q2 over the time of 
approximately 110 h. We observe that the qubit frequency remains remarkably 
stable over this period, but do observe discrete, uncorrelated steps in the 
resonance frequencies of both qubits. The resonance frequency of Ql only 
shows steps of Af= 2 MHz between two distinct levels, whereas for Q2 we 
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observe steps of both Af=1MHz and Af= 2 MHz, between three different levels, 
as also becomes apparent from the histogram. The origin of these steps could 
be, for example, the slow loading and unloading of charge traps, which 
manipulates the qubit resonance frequency through the change in electric 
field, or hyperfine coupling to anearby nuclear spin®®. 
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Extended Data Fig. 6 | Magnetic field dependence of the driving speed of Q1. 
a, b, Rabi frequency dependence on the applied microwave power Pin arbitrary 
units, for By)=0.5T (a) and By =1.65 T (b). Multiple mechanisms can beat play for 
the EDSR driving of the spins® and these are typically all linearly dependent on 
By. Asaresult of this, considerably higher driving frequencies can be reached at 
higher magnetic fields. We note that the exact microwave power cannot be 
compared between the two measurements, owing to the strong frequency 
dependence of the attenuation of our fridge lines. 
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Extended Data Fig. 7 | Relaxation, dephasing and coherence times. We and 1/2 pulses separated by waiting times T. Fitting the observed decay asa 
: : : : H 
performa Ramsey experiment, in which two 11/2 pulses are separated by timer, function of the total waiting time 2rto a power law Alep= aexpl[ - (27/T3)* |, 
during which the qubit will evolve asa result of the implemented detuning. We we find extended coherence times of 7319, = 1.9 ps and T 319) = 0.8 ps and 
fit the decay of the observed oscillations to Al.y = acos(2nAft + p)exp[-(t/T3)""], decay coefficients of a, = 1.5+0.1and ag, =2.5+0.3, for Qland Q2, respectively. 
with aa scaling factor, Afthe detuning and a phase offset, and finda spin Finally, we perform a measurement of the spin lifetime by applying a singlet 
coherence time of 73 g = 833 ns andT 3 g) = 419 ns and decay coefficients of pulse, after which we wait for a time r. We fit the decay to Al, = exp[ - (t/T,)]and 
a =1.240.2and ag, =1.5+0.2, for Qland Q2, respectively. The spin coherence find lifetimes of T,.=9 ps and T,,q)=3 US. 


can be extended by performing a Hahn echoing sequence, consisting of 1/2, 1 
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Extended Data Fig. 8 | Relaxation time T, as a function of gate voltage onthe 
tunnel barriers between dot and reservoir. a, b, The relaxation time 7, of the 
dots increases approximately exponentially as a function of the respective 
dot-reservoir gate voltage, for QI (a) as well as for Q2 (b). The relaxation time of 
Qlincreases exponentially from 7,<1,1s to 7,>10 ps, anda similar scaling is 
observed for Q2. For even smaller dot-reservoir couplings, the transport signal 
drops below our measurement limit, but switching to charge sensing could 


allowa further increase in 7,. 
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Extended Data Fig. 9 | Phase corrections on the qubits. a, Extended Ramsey 
experiment on each of the four resonance line, using five different test gates 
between the 1/2 pulses to observe the effect onthe resonance frequency. 
Alinear phase shift as a function of test gate pulse length rcan be observed for 
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some lines, indicating a frequency shift during the pulsing. b, We compensate 
for this effect by performing a software update of ¢ = 6vrto each additional 
pulse, with 6v the frequency shift of the qubit as a result of the microwave 
signal. 
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Although two-dimensional (2D) atomic layers, such as transition-metal 


chalcogenides, have been widely synthesized using techniques suchas exfoliation’ * 
and vapour-phase growth’», it is still challenging to obtain phase-controlled 2D 
structures® ®. Here we demonstrate an effective synthesis strategy via the progressive 
transformation of non-van der Waals (non-vdW) solids to 2D vdW transition-metal 
chalcogenide layers with identified 2H (trigonal prismatic)/IT (octahedral) phases. 
The transformation, achieved by exposing non-vdW solids to chalcogen vapours, can 
be controlled using the enthalpies and vapour pressures of the reaction products. 
Heteroatom-substituted (such as yttrium and phosphorus) transition-metal 
chalcogenides can also be synthesized in this way, thus enabling a generic synthesis 
approach to engineering phase-selected 2D transition-metal chalcogenide structures 
with good stability at high temperatures (up to 1,373 kelvin) and achieving high- 
throughput production of monolayers. We anticipate that these 2D transition-metal 
chalcogenides will have broad applications for electronics, catalysis and energy 


storage. 


Two-dimensional (2D) atomic-layer crystals have demonstrated many 
unique physical and chemical properties as well as broad applications 
in electronics’, sensors’, catalysts’ and batteries ’°”. Generally, 2D struc- 
tures suchas graphene, boron nitride and transition-metal sulfides can 
be produced via a top-down approach, that is, by directly exfoliating 
the vdW counterparts through mechanical’, liquid-phase’ and electro- 
chemical procedures’. In this manner, various vdW materials—such 
as metal oxides”, hydroxides” and topological insulators*—can also 
be synthesized, enriching the 2D family of materials. In these 2D vaW 
nanocrystals, the elemental compositions, stoichiometric ratios and 
structural phases are usually inherited from their parent bulk counter- 
parts, although 2D nanocrystals with phase-specific structures such 
as 1T and 2H phases are difficult to synthesize selectively®’. Here we 
demonstrate an efficient topological conversion of non-vdW solids 
suchas transition-metal carbides and nitrides under chalcogen vapours 
to 2D vdWtransition-metal chalcogenide layers with identified 2H/1T 
phases, good stability at high temperatures (<1,373 K) and achieving 
high-throughput production of monolayers. We anticipate that the 
resultant transition-metal chalcogenide layers with favourable fea- 
tures would have broad applications for electronics, energy storage 
and conversions. 

Inthe past decade, some unusual 2D nanocrystals have emerged from 
non-vdW solids such as haematite® or bulk layered transition-metal 
carbides and nitrides’, namely MAX phases, greatly increasing the num- 
ber of 2D material compositions accessible. In particular, the non-vdW 
MAX phases—where M represents a transition-metal element, A usually 
represents an element from groups 13-16 of the periodic table and X 
is carbon or nitrogen—have predominantly mixed covalent or ionic 


M-X bonds and metallic M-A bonds”’. Because the M-A bonds are 
more chemically active than the M-X bonds, A species in MAX phases 
can be extracted using highly reactive solvents (hydrogen fluoride and 
strong bases)'*”, allowing few-layer-thick 2D transition-metal carbides, 
carbonitrides and nitrides—called MXenes—to be created. These 2D 
nanocrystals are usually terminated with defects and surface termina- 
tions of -OH, -O, -F or -Cl?°”. Owing to the very close atomic packing 
and strong chemical bonds in non-vdW solids, it remains a challenge 
to convert them to 2D nanocrystals with abundant exposed surfaces 
and identified phases. 

Here we demonstrate an efficient strategy that enables us to convert 
a family of non-vdW bulk solids such as MAX phases to 2D transition- 
metal chalcogenides with well-defined phases. As depicted in Fig. 1a, 
under chalcogen-containing vapours (H,Z, where Z represents sulfur, 
selenium or tellurium and yis O or 2) at high temperatures, non-vdW 
MAX phases and transition-metal borides, silicides and carbides (Sup- 
plementary Fig. 1) have high activities. In particular, the active M-A 
bonds in MAX phases react easily with chalcogen-containing gases, 
resulting in products of AZ and MZ compositions. Such reactions must 
produce an AZ intermediate product at high vapour pressure, which 
would allow rapid evaporation rates, thus boosting the continuous 
reaction into the bulk of the reactant material. Thermodynamically, 
if the reaction temperature were high enough, all the post-transition- 
metal A (Si, Al, Sn, Ge) species in MAX phases could be transformed to 
metal chalcogenide gases (Supplementary Figs. 2-5), which facilitates 
the conversion of MAX phases to 2D nanostructures. As an example, 
based on temperature-vapour pressure relationships (Fig. 1b and 
Supplementary Fig. 6)7>”4, germanium chalcogenides (GeS, GeSe) 
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Fig. 1| Schematic illustration of the conversion of non-vdW solids to 2D vaW 
transition-metal chalcogenides. a, Non-vdW solids such as MAX phases are 
progressively transformed to 2D transition-metal chalcogenides viaa 
topological conversion reaction (MAX +H,Z (gas) > MZ +AZ), in whichM 


have higher vapour pressures than other chalcogenide materials 
(SiS, Al,S3, SnS) at 1,073 K; from this we expect Ge-containing MAX 
phases to be easily converted to 2D transition-metal chalcogenides 
with vdW layers (Fig. 2c and Supplementary Fig. 7). According to the 
Clausius—Clapeyron equation” (see equation (1) in Methods), upon 
increasing the temperature further to more than 1,100 K, other Si-, 
Sn- and Al-containing MAX phases should also serve as precursors 
for the generation of 2D transition-metal chalcogenides (Supple- 
mentary Fig. 8), owing to the increased vapour pressures of the prod- 
ucts at higher reaction temperatures. Using this principle, we have 
synthesized 13 transition-metal chalcogenides (Supplementary 
Tables 1 and 2), including 7 binary chalcogenides (based on the Ti-, 
Nb-, Mo- and Ta-containing MAX phases and MXenes), 5 heteroatom- 
doped chalcogenides with selected 2H phase or IT phase and one 
composite (based on quaternary MAX phases). This demonstrates that 
our synthetic protocol is versatile, enabling the efficient conversion 
of alarge number of non-vdW bulk solids to 2D transition-metal chal- 
cogenides. Notably, although the resulting 2D structures are derived 
from bulk MAX phases, their compositions and stoichiometric ratios 
are very different from the parent compositions and are also differ- 
ent from other products commonly derived from MAX phases, such 
as MXenes©”°, 


2D transition-metal chalcogenides (2H/1T) 

Asaproof of concept, we produced 2D transition-metal dichalcogenide 
(TMD)-MoS, nanocrystals via engineering MAX-Mo,GeC under hydro- 
gen disulfide gas at 1,073 K (Fig. 2a; see Methods). X-ray diffraction pat- 
terns reveal the disappearance of Mo,GeC peaks in the product (Fig. 2b). 
Instead, aseries of diffraction peaks at 14.1°, 32.7°, 39.5° and 58.3° are 
well indexed to the (002), (100), (103) and (110) facets of hexagonal 
MoS, (according to Joint Committee on Powder Diffraction Standards 
(JCPDS) Card No. 37-1492), demonstrating the complete conversion and 
removal of Ge-layers from MAX-Mo,GeC during our synthetic process. 
The resulting product exhibits uniform structure with largely extended 
spacing in the whole scanning electron microscope image (Fig. 2c and 
Supplementary Fig. 9), similar to those reported for expanded graphite 
and MXenes*. Transmission electron microscopy (TEM) (Fig. 2d and 
Supplementary Fig. 10) and high-resolution TEM (Fig. 2e) confirm 
clearly the highly exfoliated nanocrystals with a uniform interplanar 
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represents an early-transition-metal element, Ais an element from groups 
13-16, Xis C, N, Bor Si, and Z refers to S, Se and Te, associated with volatile AZ 
products. b, Temperature-vapour pressure relationships for various AZ 
substances. 


spacing of 0.28 nm, in good agreement with the spacing between the 
(100) facets of 2H MoS, (ref. ”). 

Toidentify the interior structure of the resultant MoS,, we conducted 
an ultrathin sectioning experiment. Many nanosheets with thicknesses 
from 0.4nmto4nmare visible (Supplementary Fig. 11), indicating the 
co-existence of monolayers, bilayers and few layers in the sample. To 
inhibit the restacking of already-expanded 2D MoS, and improve the 
fraction of monolayers, the converted samples were rapidly transferred 
to alow-temperature zone during our synthetic process. Thus, the frac- 
tion of monolayer MoS, can be improved to 31% from 9% based on our 
standard conversion (Supplementary Figs. 12-16). Remarkably, when 
we directly converted thin non-vdW solid MXene-Mo,CT,.in H,S gas, 
the fraction of monolayer MoS, was up to about 91% (Supplementary 
Fig. 17). Raman spectra of the accordion-like MoS, (Supplementary 
Figs. 18 and 19) show two typical peaks at 379 cm“ and 405 cm‘, cor- 
responding to the in-plane E,,' and out-of-plane A,, vibrational modes 
of 2H MoS, (Supplementary Fig. 67)”, respectively. On the basis of ther- 
modynamic considerations, by enhancing the reaction temperatures 
to more than 1,100 K, accordion-like TiSe, could also be derived from 
MAX-Ti,SiC, by substituting H,S with selenium vapour (Fig. 2f-h, Sup- 
plementary Figs. 20-24), owing to the high vapour pressure of the SiSe 
product at such high temperatures (Supplementary Fig. 4). An atomic- 
resolution scanning transmission electron microscopy (STEM) image 
of TiSe, (Fig. 2h) reveals the 1T superlattice with metal sites located at 
the centres of octahedral units. Such transformations suggest that our 
synthetic protocol can be generalized to convert non-vdW solids to vdW 
2D nanocrystals with identified 2H/1T phases and high-throughput 
production of monolayers (Supplementary Figs. 25-36). 


2D heteroatom-doped chalcogenides (2H/1T) 

More than 70 ternary MAX phases” and some new quaternary MAX 
phases” suchas (W,3Y,/;),AIC” and (Ti,,,Nb,2),AIC* have been explored, 
suggesting that it may be feasible to produce aseries of transition-metal 
chalcogenides with multi-compositions via our topological conversion 
approach. One possibility is to produce accordion-like Y-doped WS, 
with the 2H phase (Fig. 3a, b and Supplementary Figs. 37-39) based 
ona (W,/;Y13),AIC precursor. After conversion with fast quenching, 
highly expanded accordion-like Y-doped WS, can be obtained, in which 
the fraction of monolayers is up to 27% (Supplementary Figs. 40-45). 
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Fig. 2 | Structural characterization of 2D transition-metal chalcogenides 
derived from MAX phases. a, Schematic illustration of the conversion of MAX- 
Mo,GeC to accordion-like MoS, under H,S gas at 1,073 K. b, X-ray diffraction 
patterns of MAX-Mo,GeC and accordion-like MoS,. c-e, Scanning electron 
microscope (c), sectional TEM (d) and high-resolution TEM (e) images of 


High-resolution elemental mapping images (Fig. 3f) reveal that Y spe- 
cies are mainly monoatomically dispersed into the layers, associated 
witha few Y clusters, possibly accounting for the many Y-S bonds pre- 
sent (Supplementary Fig. 46). A typical 2H structure is visualized inan 
aberration-corrected TEM image (Fig. 3c), consistent with X-ray diffrac- 
tion (Supplementary Fig. 38) and Raman analysis (Fig. 3h), indicating 
that the heteroatoms remain stable within the host 2D transition-metal 
chalcogenide structure. To further verify the Y-S bonds in Y-doped 
WS,, we conducted X-ray absorption near-edge fine structure (XANES) 
spectroscopy measurements. In the case of Y-doped WS,, the absorp- 
tion edge in Y K-edge XANES spectra (Fig. 3i) is close to that of yttrium 
sulfide, suggesting that the Y is in the sulfide state. The Fourier trans- 
form spectra resulting from the analysis of Y-doped WS, by extended 
X-ray absorption fine structure (EXAFS) spectroscopy (Fig. 3j) show 
a dominant peak at 2.2 A, indexed to the Y-S bonds in comparison 
with yttrium sulfide. Theoretically, the energy of 2H WS, is muchlower 
than its 1T phase (>0.6 eV per formula unit). With increasing Y doping 
levels, the energy differences between 2H and IT phases noticeably 
decrease, but it remains difficult to reverse the relative stability (Sup- 
plementary Fig. 68). Owing to the substitution of W (valency +4) in 
WS, by low-valence Y (+3), the charge densities near the Fermi level 
tend to be localized around Y atoms, thus going against electron 
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accordion-like MoS,. f, Crystalline structure of IT TiSe,. g, Scanning electron 
microscope image and sectional TEM image (inset) of accordion-like TiSe,, 
indicating its highly expanded structure. h, Atomic-resolution STEM image of 
TiSe, layers and the corresponding atomic configuration (inset) of the 1T 
phase. Purple and green balls represent Tiand Se atoms, respectively. 


transfer (Supplementary Fig. 69). Moreover, the increase of defect 
scattering by Y dopants would also reduce the electrical conductivity. 
Thus, the electrical conductivity of Y-doped WS, is experimentally 
measured to be 2.31 x 10° S cm“, much lower than that of pure WS, 
(8.13 x 10S cm”) (Supplementary Table 3). This is in contrast to those 
reported for d orbital electron-enriched metal (Re, Nb)-doped MoS,” 
and our Nb-doped TiSe, (Supplementary Figs. 47-50), exhibiting sub- 
stantial improvements in the electrical conductivities (Supplementary 
Figs. 51,70 and Supplementary Table 3). 


2D heteroatoms (Y, P) co-doped WS, (1T) 


While engineering the quaternary MAX phase (W,,3Y,/3),AIC, other 
vapours such as phosphorus could be easily and simultaneously 
introduced into the synthetic system, creating both Y and P co-doped 
WS, (Supplementary Figs. 52-56). As illustrated by the Raman spec- 
tra (Fig. 4d), for Y, P co-doped WS, there are three dominant peaks 
at 130 cm™ (J,), 258 cm™ (J,) and 406 cm (J,), corresponding to the 
vibration modes of the 1T phase®”’, as well as two peaks at 348 cm? 
and 414 cm", indexed to the E,,'and A,, peaks of the 2H phase, respec- 
tively. The presence of the IT phase in real space can be visualized 
via aberration-corrected STEM images (Fig. 4a and Supplementary 
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Fig. 3 | Structural characterization of 2D heteroatom-dopedtransition- 
metal chalcogenides with 2H phase derived from quaternary MAX phases. 
a, Schematic illustration of the conversion of quaternary MAX-(W,/3Y,/3)2AIC to 
Y-doped WS,. b, Typical scanning electron microscope image of Y-doped WS,, 
showing the expanded structure. c, d, High-resolution TEM (c) and high-angle 
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annular dark-field (d) images of Y-doped WS, layers. e-g, Elemental mapping 
images of W (e), Y (f) and S (g) species in Y-doped WS, layers. h, Raman spectra 
of Y-doped WS, and bulk WS,. i,j, Y K-edge XANES spectra (i) and Fourier 
transform spectra (j) of Y K-edge EXAFS for Y-doped WS,, demonstrating the 
presence of Y-S bonds in Y-doped WS,. 
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Fig. 4 | Structural characterization and electrical properties of 2D 
heteroatom (Y and P) co-doped WS, with IT phase derived from quaternary 
MAX-(W,;Y,/3)2AIC. a, Atomic-resolution STEM image and its corresponding 
fast Fourier transform patterns (inset). b, Atomic configuration of Y, 
Pco-doped WS,, exhibiting the 1T phase. c, Line intensity profile along the 


highlighted arrowina.d, Raman spectra of Y, Pco-doped WS,. e, Fourier 
transform spectra of Y K-edge EXAFS. f, P K-edge XANES spectra for Y, 
Pco-doped WS,. g, Current versus voltage curves of Y, Pco-doped WS,, Y-doped 
WS, and exfoliated WS,. 
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Fig. 57). Combined with the intensities of cation and anion sites, the S 
and W elements can be identified using image contrast, as shown by 
the line intensity profile (Fig. 4c) acquired along the highlighted arrow 
in Fig. 4a. All the metal W atoms are located at the centres of octahe- 
dral units, in accordance with the atomic models of the 1T phase*** 
(Fig. 4b). Y elemental mapping images (Supplementary Figs. 58a and 
59a) show that all the Y atoms are located at the centres of octahedral 
units, occupying the W sites in WS,. This can be further demonstrated 
via the line intensity profiles (Supplementary Figs. 58c, 58e and 59c), 
in which the intensities of the Y sites are weaker than those of the W 
sites and stronger than those of the S sites, directly demonstrating 
the presence of Y-S bonds in Y, P co-doped WS,,. By carefully analys- 
ing the Y, P elemental mapping images and the line intensity profiles 
(Supplementary Figs. 58f, 58g, 59d-i), it is clear that the P atoms are 
located on the top of both Y and S sites, verifying the presence of Y-P 
and P-S bonds in the sample. These bonds can be further demonstrated 
via Fourier transform spectra of EXAFS (Fig. 4e and Supplementary 
Fig. 60), where there are two dominant peaks between 1.3 Aand 2.8 A, 
corresponding to the overlap of Y-O, Y-S and Y-P bondsat 1.7 A, 2.2A 
and 2.4 A, respectively. Inthe P K-edge XANES spectra of Y, Pco-doped 
WS, (Fig. 4f), there is one prominent peak centred at 2,158 eV, attributed 
tothe overlap of P-Y bonds (2,157 eV) and P-S bonds (2,159 eV), in good 
agreement with the above EXAFS, aberration-corrected STEM images 
and the corresponding elemental mapping analysis. 

Density functional theory calculations confirm that as P atoms 
adsorb on the hollow site 2 (h2), sulfur site 2 (S2) and tungsten site (W), 
the energy differences between 2H and 1T WS, are negligible. However, 
when P atoms adsorb on the top of the Y atoms and their neighbours 
suchas hollow site 1 (h1) and sulfur site 1 (S1), the energies of the 1T WS, 
are surprisingly lower than when in the 2H phase, revealing that there 
is a unique yttrium-phosphorus (Y-P) joint effect that stabilizes the 
configuration of the 1T phase (Supplementary Figs. 71 and 72). Further- 
more, itis difficult to reverse the relative energy stability between 2H 
and IT WS, by independent Y-doping or P-adsorption (Supplementary 
Figs. 61 and 73). Even after storage under ambient conditions for about 
one year, the 1T-containing WS, remains stable (Supplementary Fig. 62), 
unlike the 1T transition-metal dichalcogenides produced via traditional 
methods that have poor stability at high temperatures (>573 K)®*. Such 
Y, Pco-doped WS, exhibits a linear current-voltage (/- V) characteristic 
with a low resistance of 387 kQ per 7, close to that reported for 1T’ 
WS, (430 kOQ per [)°, three orders of magnitude lower than those of 
2H Y-doped WS, (413 MQ per [) and exfoliated WS, (124 MO per) 
(Fig. 4g, Supplementary Fig. 63 and Supplementary Table 3). 

The accordion-like structure, highly exposed surfaces and abun- 
dant 1T phase (Fig. 4d and Supplementary Fig. 64) of the resultant 
transition-metal dichalcogenides mean that they could be directly 
used as electrocatalysts for the hydrogen evolution reaction (Supple- 
mentary Figs. 65 and 66). We believe that our synthetic protocol has the 
potential to convert a series of non-vdW solids to 2D vdW nanocrystals 
with selected phases, achieving high-throughput monolayers, specific 
dopants and tailored electronic features, as well as broad applications 
in fields such as electronics, catalysis and energy storage. 
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Methods 


Synthesis of MAX phases 

Some MAX phases were synthesized by ball-milling of commercially 
available powders and subsequent calcination treatments* *°. Taking 
Mo,GeC as an example, commercial Mo, Ge and graphite ina molar ratio 
of 2:1.05:1 were sealed in an agate container with agate balls and milled 
at 600 rpm for 20 h. The mixture was then heated at arate of 3K min? 
until it reached 1,673 K and was maintained at this high temperature 
for 4h. After cooling to room temperature, the bulk was ground to 
produce MAX-Mo,GeC. For other MAX phases, the details are listed in 
Materials and Methods in the Supplementary Information. 


Synthesis of 2D transition-metal chalcogenides 

Transition-metal sulfides were prepared by the reaction of MAX phases 
or MoB with H,S gas at temperatures of 1,073-1,373 K. Taking a TMD- 
MoS, as an example, 300 mg of MAX-Mo,GeC was heated at a heating 
rate of 10 K min™ under Ar flow, and an H,S/Ar (10 vol.% HS) mixture 
was injected when the temperature reached 1,073 K. MAX-Mo,GeC was 
maintained at this temperature for 4 h, generating TMD-MosS.. Transi- 
tion metal selenides were prepared by the reaction of MAX phases, 
MoB and MoSi, with Se vapours at temperatures of 1,073-1,373 K. Spe- 
cifically, 2 g of Se powder and 300 mg of MAX-Ti,SiC, were placed in 
low- (973 K) and high- (1,173-1,273 K) temperature zones, respectively. 


Synthesis of 2D heteroatom-doped transition-metal 
chalcogenides 

Heteroatom-doped transition-metal chalcogenides were synthesized by 
the reaction of quaternary MAX phases with chalcogen-containing gases. 
Specifically, 300 mg of MAX-(W,3Y,3),AIC was heated at a heating rate 
of 10 K min“ under Ar flow, and then H,S/Ar mixture was injected when 
the temperature reached 1,273 K. MAX-(W,,3Y1/3)AIC was maintained 
there for 4hto produce Y-doped WS,. Nb-doped TiSe, was derived from 
MAX-(TijNb,.),AIC using the same procedures as for making TMD-TiSe,. 


Synthesis of 2D heteroatoms (Y and P) co-doped WS, 

Y, Pco-doped WS, was synthesized by the reaction of MAX-(W3Y1/3)2AIC 
with H.S gas and P vapour at a high temperature of 1,273 K. Specifi- 
cally, 300 mg of (W,3Y13),AlC and 1g of P were placed into two separate 
crucibles, where P powder was placed ina low-temperature upstream 
zone maintained at 873 K. (W,/;Y13),AIC was heated at a heating rate of 
10 K min‘ under Ar flow, and then the H,S/Ar mixture was injected when 
the temperature reached 1,273 K. (W.3Y,/3),AlC was maintained at 1,273 K 
for 4h to produce Y, P co-doped WS.. Similarly, P->doped MoS, was 
prepared by the reaction of MAX-Mo,GeC with H,S gas and P vapour 
at a high temperature of 1,073 K. 


Synthesis of P-doped WS, 

P-doped WS, was synthesized by the reaction of bulk WS, with P vapour 
at high temperature. Specifically, 300 mg of WS, powders were put in 
a porcelain boat with 1 g of P at the upstream zone.Then the boat was 
heated to 1,273 K at a heating rate of 10 K min“ under Ar flow and kept 
there for 30 min to generate P-doped WS.,. 


Fabrication of thin films of Y, P co-doped WS,, Y-doped WS, and 
exfoliated WS, 

Y, P. co-doped WS, thin film was fabricated by vacuum filtration of Y, 
P co-doped WS, nanosheets on nylon membrane filters, which were 
acquired by a facile liquid exfoliation of accordion-like Y, P co-doped 
WS, in an isopropyl alcohol solvent’. Other thin films of Y-doped WS,, 
WS,, Nb-doped TiSe, and TiSe, were similarly obtained. 


Characterization 
The morphology and microstructure of materials were characterized by 
scanning electron microscopy (Zeiss MERLIN Compact), transmission 


electron microscopy (JEOL 2100F), spherical aberration-corrected 
transmission electron microscopy (FEI Titan G2) and X-ray diffraction 
(Rigaku D/MAX2200pc). Raman spectra were recorded ona Renishaw 
inVia Microscopic confocal Raman spectrometer using a532-nm laser 
beam. X-ray photoelectron spectroscopy was recorded by a Thermo 
Electron ESCALAB 250 XPS spectrometer. Atomic force microscopy 
measurements were carried out ona Dimension ICON scanning probe 
microscope (Veeco/Bruker). X-ray absorption near-edge fine structure 
(XANES) and extended X-ray absorption fine structure (EXAFS) data 
for the Y K-edge were collected on BL14W1 and BL1IWI1B at the Shanghai 
Synchrotron Radiation Facility and the Beijing Synchrotron Radiation 
Facility, respectively. XANES data for the P K-edge were collected on 
BL4B7B at the Beijing Synchrotron Radiation Facility. Current-ver- 
sus-voltage measurements were conducted using the two-electrode 
method onan electrochemical workstation (CHI760E, CH Instruments) 
inavoltage range of -1Vto1Vatascanrate of 1O mVs". The electrical 
conductivities of powder samples were investigated on a four-probe 
powder resistivity tester (ST2722-SZ, Suzhou Jingge Electronic Co., Ltd). 


Vapour pressure calculations 

With the aid of log K; values for two-phase equilibria solid—gas or liq- 
uid-gas (AZ. jiq~AZ gas), We Calculated the vapour pressures of AZ gases 
according to the following equation”: 


log K= log K(AZgas) ~ log K(AZso1,1ig) ie log[p(AZg,5)/A(AZ so) 1iq)] 


where pis the vapour pressure, ais the activity of AZ inthe condensed 
phase, K; is the equilibrium constant of formation reaction and log K; 
values taken from the literature” are partially listed in Supplementary 
Tables 4-7. In general, a=1 for pure substances in a condensed phase. 

For the evaporation equation of pure substances at a given tem- 
perature T: 


log K(AZ,,5) ~ log KAZ 61 tig) = log P(AZ,,5) 


and 


P(AZ gas) = 1Q!oekrAZgas)—logk (AZso1,1iq) 


where pis in units of bar (1 bar = 10° Pa). Taking 1/7 and In pas the hori- 
zontal and vertical axis respectively, linear curves were made as shown 
in Supplementary Fig. 6. According to the Clausius—Clapeyron equa- 
tion”: 


Inp=—- AH,/(RT) +C (1) 


where p is vapour pressure, AH,, is molar enthalpy of evaporation, 
Ris molar gas constant and Cis aconstant, these linear curves confirm 
that the Clausius—Clapeyron equation can be applied to evaluate the 
relationship between vapour pressure and temperature. 

In the case of AZ substances that lack log K; values in the litera- 
ture, the boiling point and the corresponding vapour pressure 
(1 atm = 101,325 Pa) are selected to evaluate the relationship between 
vapour pressure and temperature (Supplementary Table 8)”. 


Data availability 


The data that support the findings of this study are available from the 
corresponding authors on reasonable request. 
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Ubiquitous processes in nature and the industry exploit crystallization from 
multicomponent environments’ °; however, laboratory efforts have focused on the 


crystallization of pure solutes®’ and the effects of single growth modifiers®”. Here we 
examine the molecular mechanisms employed by pairs of inhibitors in blocking the 
crystallization of haematin, which is a model organic compound with relevance to 

the physiology of malaria parasites°". We use a combination of scanning probe 
microscopy and molecular modelling to demonstrate that inhibitor pairs, whose 
constituents adopt distinct mechanisms of haematin growth inhibition, kink blocking 


and step pinning 


12,13 


, exhibit both synergistic and antagonistic cooperativity 


depending on the inhibitor combination and applied concentrations. Synergism 
between two crystal growth modifiers is expected, but the antagonistic cooperativity 
of haematin inhibitors is not reflected in current crystal growth models. We 
demonstrate that kink blockers reduce the line tension of step edges, which facilitates 
both the nucleation of crystal layers and step propagation through the gates created 
by step pinners. The molecular viewpoint on cooperativity between crystallization 
modifiers provides guidance on the pairing of modifiers in the synthesis of crystalline 
materials. The proposed mechanisms indicate strategies to understand and control 
crystallization in both natural and engineered systems, which occurs in complex 
multicomponent media‘ >°”. In a broader context, our results highlight the 
complexity of crystal-modifier interactions mediated by the structure and 

dynamics of the crystal interface. 


Crystallization is the central process of materials synthesis in biological, 
geological and extraterrestrial systems”. Nature achieves a remarkable 
diversity of shapes, patterns, compositions and functions of the arising 
crystalline structures by combining simple strategies to control the 
number of nucleated crystals and their anisotropic rates of growth”*. 
To promote or inhibit crystallization in both natural and engineered 
environments, soluble foreign compounds are deployed that interact 
with the solute or the crystal/solution interface”. In many cases, two 
or more modifiers operate in tandem to alter the processes of crystal- 
lization*"* 1; however, the fundamental mode(s) of cooperative action 
is not well understood. 

To gain molecular-level insight into the mechanisms of coopera- 
tivity between crystallization modifiers, we examine the growth of 
B-haematin crystals, which form in malaria parasites as a part of their 
haem-detoxification mechanism”, in the presence of quinoline com- 
pounds, which represent a major class of the currently employed anti- 
malarials”>”*, Recent work has established that B-haematin crystal 
growth follows classical mechanisms, whereby new layers nucleate on 
the crystal surfaces and advance by incorporation of solute molecules 
at the steps”. These studies uncovered two distinct classes of quinoline 
inhibition of step propagation”. In the first mechanism, knownas ‘step 
pinning’, chloroquine (CQ) and quinine (QN; Fig. 1a) bind to flat terraces 


and arrest crystal formation over broad areas of the crystal surface 
(Fig. 1b)”. In addition, amodiaquine (AQ) and mefloquine (MQ; Fig. 1a) 
were found to block kinks, the sites where haematin molecules incor- 
porate into steps (Fig. 1c)”. 

Even though combinations of two or more crystal growth inhibi- 
tors are common in many drug formulations”®, a crucial gap in the 
understanding of interactions between inhibitor pairs that regu- 
late haematin crystallization has been identified?””*. To address the 
molecular mechanism of action of binary inhibitor combinations 
on B-haematin crystal growth, we pair a step pinner, CQ or QN, with 
a kink blocker, MQ or AQ. We classify the cooperativity between 
paired inhibitors as synergistic, additive or antagonistic accord- 
ing to whether the response to a combination of two inhibitors is, 
respectively, stronger, equal or weaker than the sum of the responses 
to individual doses”. 

Binary inhibitor combinations impose dramatic changes in the 
shapes and dimensions of B-haematin crystals (Fig. 1f-i, Extended Data 
Fig. 1). The crystal length along the ccrystallographic axis is the result 
of growth in the [011] and [O11] directions (Fig. 1d, e). The shorter aver- 
age length enforced by both MQ and CQ than by either modifier sepa- 
rately indicates a strong synergistic activity of these two inhibitors 
(Fig. 1f). Asthe crystal length is insensitive to the presence of MQalone®, 
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Fig. 1| Cooperativity between four pairs of inhibitors in suppressing bulk 
growth of B-haematin crystals. a, Structures of step pinners CQ and QN and 
kink blockers MQ and AQ. b, Schematic of step pinning, where Axis the 
separation between inhibitor molecules (shown in gold) adsorbed on flat 
crystal terraces and R. is the critical radius of the two-dimensional nucleus. 
Step growth is delayed if Axis comparable to 2R, and arrested if Ax<2R.. 

c, Schematic of inhibitors (shown in blue) inhibiting step advancement by 
partial blocking of access of solute molecules to kinks. d, Scanning electron 
microscopy micrograph and schematic illustrating the B-haematin crystal 
shape. /, length; w, width. e, Preservation of the crystal shape during growthin 
pure solutions and inhibitor-induced suppression of crystal length or width by 


additive cooperativity of CQand MQ engenders crystal lengths similar 
to those constrained by only CQ. By contrast, the crystal lengths 
affected by the pairing of AQand CQare substantially longer than those 
engendered by only CQ, implying an antagonistic cooperativity 
between these two modifiers. The addition of either MQ or AQ to 
CQ-containing solutions enforces greater crystal widths than those 
with only CQ (Fig. 1g). The crystal width increases owing to growthin 
the (010) directions (Fig. 1d); thus, greater widths indicate that the 
MQ/CQ and AQ/CQ pairs impede growth of {010} faces to alesser extent 
than CQ on its own. We previously reported that MQ and AQ weakly 
affect the crystal width”; therefore, these new findings indicate antag- 
onistic cooperativity of CQ with kink blockers MQ and AQ in inhibiting 
the width of B-haematin crystals. Notably, in select inhibitor concentra- 
tion ranges (for example, Cog < 1 UM and Cy < 4 LM) synergism in 
suppressing growth along the c axis accompanies antagonistic 
cooperativity towards growth in the b direction (Fig. 1f, g); the opposite 
responses are probably defined by the selective binding of the 
inhibitors to the individual crystal faces dictated by their distinct 
structures”. Importantly, they further weaken the synergistic 
cooperativity of CQ and MQ in inhibiting haematin sequestration into 
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interaction of inhibitors with axial and lateral crystal faces, respectively. Red 
spheres denote inhibitors of the c face; green spheres denote inhibitors of 
the b face. f-i, Variations of the average length /and width w of crystals grown 
inthe presence of increasing concentrations of four inhibitor pairs at the 
displayed ratios relative to /, and wy, reached after growth in pure 

growth solutions for 16 d at 23 °C. Error bars represent the standard deviation 
of about 30 measurements. Lines are guides for the eye. In all experiments, 
haematin concentration c,,= 0.28 mM and supersaturation o=In(c,,/c,) = 0.93, 
wherec,=0.11mM is the solubility at 23 °C. The majority of the length and 
width data for individual modifiers are from Olafson et al.” and are consistent 
with additional measurements of the effects of QN. 


crystals. Combining MQ and AQ with QN elicits mostly synergistic 
responses of both the crystal length (Fig. 1h) and width (Fig. 1i). 
Antagonistic cooperativity between crystallization inhibitors 
appears counterintuitive. To understand the effects of inhibitor 
combinations on the molecular processes of growth of the (100) face 
of B-haematin crystals, we used time-resolved in situ atomic force 
microscopy (AFM)”””. We scrutinized inhibitor effects on the rate of 
two-dimensional nucleation of new crystal layers J,» and the rate of 
propagation of steps v. For/,), we counted the number of new layer 
nuclei that grow above a critical radius R, per unit area of the surface 
and unit time (Fig. 2a). We determined v from the displacement of the 
steps over time (Fig. 2a)”. The correlation between/J,, and the concen- 
tration of the inhibitors demonstrates that the addition of the kink 
blockers MQ and AQ tothe step pinner CQ substantially enhances the 
nucleation of new layers relative to that with solitary CQ, indicating 
strong antagonism (Fig. 2b). The cooperativity between CQ/MQ and 
CQ/AQ in suppressing vis antagonistic at almost all tested inhibitor 
concentrations (Fig. 2c, Extended Data Fig. 2b, Extended Data Table 1). 
MQand AQ exhibit a similar transition towards stronger antagonism 
when combined with QN. MQ, whichalone does not suppress/J>p (ref. ), 
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Fig. 2| Cooperativity of inhibitor pairs in suppressing layer generation and 
spreading. a, Time-resolved in situ AFM images showing the nucleation and 
growth of newlayers ona (100) face at c, =0.28 mM and supersaturation 
o=In(c,/c.) = 0.56, wherec, = 0.16 mM is the solubility at 28 °C, the temperature 
inthe AFM liquid cell. Arrows indicate newly nucleated islands that are counted 
to determine the rate of two-dimensional nucleation, J,). The growth of the 
island dimension /underlies the determination of the step velocity, v. The 
bright lines with striations at the top and bottom of some of the panels 
correspond tothecrystal edges. b-e, Decrease in/,y relative to that inthe 
absence of any inhibitor,J,p . (b, d) and of v relative to that in the absence of any 
inhibitor, vy (c, e) with increasing concentrations of CQ (b, c) and QN (d, e) 
inhibitor pairs at the displayed ratios. Error bars represent the standard 


exhibits synergistic cooperativity with QN at Cyg<4 "Mand antagonism 
at Cyg>4 UM (Fig. 2d). Similarly, AQ, which on its own depresses/J,p by 
up to 60% (ref. ’), transitions from synergy at C49 <2 1M to antagonism 
at Cyg>4 UM. Both MQand AQ strongly inhibit the step velocity ywhen 
acting alone” and the similarity between velocity profiles measured in 
the presence of QN/MQ and QN/AQ combinations to those obstructed 
by only QN (Fig. 2e) signify strong antagonism between MQ and QN 
and between AQ and QN. The cooperativity of inhibitor pairings can 
be quantified from isobolograms (Fig. 2f, g), an established method 
in pharmaceutical research, in which the doses of paired inhibitors 
needed to inhibit/,, and v by a certain percentage are compared with 
the sum of the responses to each inhibitor applied individually”. 
We establish that the antagonism between step pinners and kink 
blockers in inhibiting bulk crystallization and the surface processes on 
(100) faces is not motivated by the formation of inhibitor-haematin 
complexes in solution. We examined whether the constituents of an 
inhibitor pair formed binary complexes that do not impede crystalliza- 
tion. Such complexation would lower the concentration of the active 
inhibitor and constrain their potency. We tested the formation of CQ/ 
MQ, CQ/AQ, QN/MQ and QN/AQ binary complexes. Considering that 


MQ concentration (4M) 


deviation from the average of 15 to 25 measurements of /,, and 25 to 35 
measurements of v, and are, insome cases, smaller than the symbol size. Lines 
are guides for the eye. Data for individual modifiers are from Olafson et al.°. 

f, g, lsobolograms characterizing the inhibition of v by QN/MQ (f) and 

CQ/MQ (g). Open symbols indicate the concentrations of individual inhibitors 
that elicit a certain percentage of inhibition, referred to as inhibitory 
concentrations (ICs). Dashed lines correspond to additive cooperativity 
between the paired inhibitors for a certain percentage of inhibition (10-50%). 
Solid symbols represent the concentrations of the paired inhibitors that evoke 
the same inhibition. Rightward shifts of the solid symbols from the respective 
dashed lines indicate antagonistic cooperativity. The corresponding 
combination index values are listed in Extended Data Table 1. 


the four inhibitors form complexes with haematin™®”°, we also explored 
whether these four combinations assemble into ternary compounds 
that include haematin. The results presented in Extended Data Fig. 3 
show that no complexes involving both inhibitors exist in the solution, 
and imply that complexation between the applied inhibitors is not the 
source of the observed antagonistic cooperativity. 

Additive and synergistic cooperativity in suppressing J, and v 
between a kink blocker and a step pinner can be understood within the 
realm of common crystal growth models. Blocking of kinks lowers the 
kinetic constant for growth, which works in parallel with the depression 
of the crystallization driving force due to step curvature enforced by 
step pinners (Fig. 1a, b). We posit that the antagonism between the two 
types of inhibitors originates from the reduction of the step line tension 
y, athermodynamic prerequisite for the adsorption of kink blockers 
at steps*. On the basis of the Gibbs-Thomson relation, y regulates the 
radius of the critical two-dimensional nucleus according to R, = Qy/ 
Au (ref. *!), where Q is the molecular volume, Ay = k,71n(c,/c,) is the 
chemical potential difference between the solution and the crystal, 
k, is the Boltzmann constant, Tis the temperature, c,, is the haematin 
concentration and c, is the solubility. In turn, lower y and R, stimulate 
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Fig. 3 | Characterization of the effects of the kink blockers MQ and AQ on 
layer nucleation. a, Time-resolved in situ AFM images showing growing 
(land II) and dissolving (III) islands ona (100) face at c,,= 0.28 mM and 
supersaturation o=In(c,,/c,) = 0.56. b,c, Dependence of the radius of the 
critical two-dimensional nucleus R, onthe crystallization driving force 
Ap=k,TIn(c,/c.) in pure haematin solution and in the presence of MQ (b) and 
AQ (c). Error bars represent the standard deviation from the average of 25 to 30 
measurements. Solid lines are plots of the Gibbs-Thomson relation R.=Qy/Au 
with step line tension y=25 mJ m“for pure haematin and 20 and 22 mJ m” for 
MQand AQ, respectively. Data for pure haematin are from Olafsonetal.”. 
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faster layer nucleation as /,, =/,exp[-ttyR,h/(k,T)] (A =1.2 nm is the 
step height)” and expedite step propagation in the gaps between 
the adsorbed step pinners (Fig. 1b). We developed (Supplementary 
Sections 3, 4, Extended Data Figs. 5,6) an analytical model of the com- 
bined action of step pinners and kink blockers on step propagation and 
analysed the consequences of the presence of two types of inhibitors 
on the nucleation of a new crystal layer (Supplementary Section 5). 
This examination advocates that the classical synergistic effects domi- 
nate at low concentrations of either inhibitor, whereas the proposed 
mechanism of antagonism mobilizes at high concentrations; stronger 
antagonism between step pinners and kink blockers is projected for 
their joint action on/,, rather than on v (Supplementary Section 5). 
Both predictions are borne by the/,, and uv correlations (Fig. 2b-e). 
Data on layer nucleation in the presence of MQ or AQ demonstrate 
that y decreases in the presence any of these inhibitors and that the 
measured Ay correlates with the inhibition of step motion due to asso- 
ciation of these inhibitors with the kinks. From AFM images, we directly 
measured R, in the presence of 2.5 pM MQ or AQ. This parameter rep- 
resents the critical size of a two-dimensional nucleus of acrystal layer 
below which nucleitend to dissolve, whereas nucleilarger than R, have 
a greater probability to grow (Fig. 3a). We monitored the evolution of 
25 to 30 layer nuclei at each value of Ay and inhibitor concentration, 
where Ay was varied by the selection of the haematin concentration 
Cy (Fig. 3b). The relation between R, and Au (Fig. 3c, d) is reciprocal, 
consistent with the Gibbs-Thomson relation, and reveals that the pres- 
ence of MQ and AQ lowers y from a nominal value of 25 + 2 mJ m”’ to 
20 +2 and 22 +1m) m*%, respectively. In Methods, we discuss statisti- 
cal tests that certify the distinction of the three y values and relate 
decreasing y to association of MQ and AQ with the kinks. We assume 
the two kink blockers adsorb to the steps following a Langmuir-type 
law. In Supplementary Sections 1, 2, we evaluate —Ay using the Gibbs 
equation of adsorption, [= —dy/dy,, where Fis the amount of inhibi- 
tor absorbed at kinks and [Ug = {fg9 + KgTnc, is the chemical potential 
of the kink blocker, MQ or AQ, at concentration c, (ref. *). From these 
relations and Extended Data Fig. 4, Extended Data Tables 3, 4, we 
obtain Ay=-3 mJ m“ for both MQand AQ, in good agreement with the 
values for these two inhibitors assessed from the R,(Ayw) correlations 


Fig. 4 | Solid-on-solid kinetic Monte Carlo 
modelling of the action of kink blockers and step 
pinners onstep propagation. a, Kink blockers 
(magenta spheres) associate with kinks and 
incorporate inthe crystal. b, Dependence of the step 
velocity v relative to that in pure solution v, on the 
concentration of kink blockers Dyink blocker Felative to 
Protay the summed concentration of solute and kink 
blockers. c, Step pinners (gold spheres) adsorb on the 
terraces between steps and enforce curved steps. 

d, Dependence of the step velocity v relative to that in 
pure haematin solution v, onthe surface density of 
step PINNeTS, Prep pinner- Error barsinb and d represent 
the standard error of the simulations, evaluated as 
discussed in Methods. e, Step pinners adsorbed on 
the surface arrest step advancement. Four numbered 
step pinners mark the step location. f, g, Addition of 
kink blockers stimulates the growth of astep stalled 
by step pinners; g presents a later moment of the 
same simulationasinf.h, Magnified view of astep 
squeezed between stoppers land 3, showing kink 
blockers associated with kinks in the growing step 
segment. 


(Fig. 3b, c). These Ay invoke an equivalent contraction of R, (ref. °). 
Asa20% decrease in R, is equivalent to a 1.44-fold (1.27) lowering of the 
surface coverage of adsorbed step pinners, and given that/,, and vare 
highly sensitive functions of both ccg and Coy, the decrease in y elicits 
a disproportionally strong response of v and /J,,. 

Insitu AFM measurements were complemented with kinetic Monte 
Carlo simulations to test the generality of the proposed model of antag- 
onistic cooperativity between two classes of crystallization inhibi- 
tors. We developed a solid-on-solid model for step growth”, in which 
molecules associate and dissociate from steps. For simplicity, we 
ignored surface diffusion onthe terraces. The rate of solute association 
depends on the supersaturation, whereas the probability of detach- 
ment is dictated by the bonds a molecule forms with its neighbours 
(Supplementary Video 1). We assume that kink blocker adsorption and 
detachment are analogous to solute molecules, and that the relevant 
dynamics are governed by their concentration and the number and 
strength of the bonds at an adsorption site (we assume that two of 
the lateral bonds are stronger and that the remaining two are weaker 
than for the solute molecules). These assumptions lead to preferential 
binding to the kinks at steps (Fig. 4a) and constrained v (Fig. 4b, Sup- 
plementary Video 2). We assume that step pinners bind strongly to 
the crystal surface, but exhibit no interactions with crystal molecules 
parallel to that plane. The surface is decorated with a square array of 
step pinners and they remain static throughout the simulation (Fig. 4c, 
Supplementary Video 3); previous results have demonstrated that 
the step-pinner surface distribution has no effect on the step veloc- 
ity. Remarkably, the calculated correlations between v and inhibitor 
concentrations are akin to those observed experimentally for the kink 
blockers MQ and AQ, for which v levels off at around 50% inhibition 
(Fig. 4b), as well as the step pinners CQ and QN, which induce complete 
growth arrest at moderate inhibitor concentrations (Fig. 4d)”. 

Combining step pinners at a concentration above the threshold for 
complete growth arrest (Fig. 4e) with kink blockers allows steps to 
advance through pinned sites, thereby re-establishing layer growth 
(Fig. 4f, g, Supplementary Videos 4, 5). The simulations reveal that 
at the microscopic level, the antagonistic cooperativity is due to the 
stabilization of step edge fluctuations by associating kink blockers. 
Steps overcome the pinner palisade by fluctuations that penetrate the 
gaps between the pinners (Fig. 4c, Supplementary Video 5). Closely 
spaced pinners suppress the extent and lifetime of the fluctuations 
and restrain step growth. The blockers bind to the kink-rich fingers 
embodying the fluctuations (Fig. 4h) and increase the fluctuation life- 
time. At the macroscopic level, the stabilized fluctuations manifest as 
a decrease in y. Indeed, an attenuated y enforces shorter R,, which, in 
turn, allows step progress between the pinners (Fig. 4f-h). 

In summary, we present a mechanism of antagonistic cooperativ- 
ity between crystallization inhibitors by which kink blockers attenu- 
ate the step line tension and facilitate step propagation through the 
palisade of step pinners. This mechanism may provide guidance in 
the search for suitable inhibitor combinations to control crystalliza- 
tion of pathological, biomimetic and synthetic materials. Ina broader 
context, our results highlight modifier interactions mediated by the 
dynamics and structures on the crystal interface as a prime element 
of the regulation of the shapes and patterns of crystalline structures 
in nature and industry. 
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Methods 


Solution preparation 

The following compounds were purchased from Sigma Aldrich: haema- 
tin porcine (=98%), citric acid (anhydrous, >99.5%), sodium hydroxide 
(anhydrous, >98%), n-octanol (anhydrous, >99%), porcine haematin, 
chloroquine diphosphate (98%), quinine (anhydrous, >98.0%), amo- 
diaquine dihydrochloride dihydrate and mefloquine hydrochloride 
(anhydrous, =>98.0%). All reagents were used as received. Deionized 
water was produced bya Millipore reverse osmosis—ion exchange sys- 
tem (Rios-8 Proguard 2-MilliQ Q-guard). 

Citric buffer at pH 4.80 was prepared by dissolving 50 mM of citric 
acid in deionized water and titrating the solution, under continuous 
stirring, with 0.10 M NaOH to the desired pH. The buffer pH was veri- 
fied before each experiment and fresh buffers were prepared every 
month. We placed 5 ml of citric buffer at pH 4.80 in direct contact with 
n-octanol at 23 °C and allowed 30 min for equilibration. The upper 
portion of the two-phase system was decanted and denoted as citric 
buffer-saturated octanol (CBSO). 

For this study, we used four antimalarial drugs: QN, CQ, AQand MQ. 
Solid QN and MQ were added to CBSO and the solutions reached the 
desired concentration after 2-4 d. AQ and CQ were added in excess to 
CBSO and stored in the dark for 30-45 d, allowing the concentrations to 
approach the respective solubilities®. All drug solutions were filtered 
through 0.2 um nylon membrane filters and the concentrations were 
determined by ultraviolet—visible spectrometry using a Beckman DU 
800 spectrophotometer and extinction coefficients and wavelengths 
listed in Ketchum et al.’ 

Haematin solutions were prepared by dissolving haematin powder 
in 8 ml of freshly made CBSO and heating it to 70 °C for 7-9 h. The 
solution was filtered through a 0.2 um nylon membrane filter and 
the concentration was determined using an extinction coefficient 
of 3.14 0.1cm™ mM ‘at a wavelength of 594 nm (refs. ?*). 


Characterization of the combined inhibitor effects on bulk 
haematin crystallization 

We adopted the procedure reported by Olafson et al.”2*> to produce 
haematin crystals from supersaturated haematin solution in CBSO. 
We tested crystal growth in the presence of four drug combinations, 
CQ/MQ, CQ/AQ, QN/MQ and QN/AQ, with constant ratios between 
the two constituents of 1:4, 1:2, 1:2 and 1:2, respectively. Drug combina- 
tions were added to the haematin stock solution to achieve final total 
inhibitor concentrations ranging from 0 to 15 pm while maintaining a 
constant haematin concentration (c,,= 0.28 mM). The vials were then 
shaken until the solution was well mixed. A15-p1m-diameter glass slide 
was scratched in the centre and placed at the bottom of the vial in con- 
tact with the supersaturated solution. Vials were capped and placed 
in an incubator at 23 °C with minimal exposure to light. B-Haematin 
crystals were observed in 1-2 d and reached their maximum length 
after around 2 weeks. The glass slide with attached haematin crystals 
was collected, washed with deionized water and ethanol, dried with 
nitrogen gas and then coated with 10-20 nm gold for scanning elec- 
tron microscopy. The length and width of about 30 crystals at each 
composition were measured to assess the effectiveness of inhibitor 
combinations. 


In situ monitoring of the haematin crystal evolution 

We used a multimode atomic force microscope (Nanoscope IV) 
from Digital Instruments for all AFM experiments. AFM mages were 
collected in tapping mode using Olympus TR8OOPSA probes (silicon 
nitride, Cr/Au coated 5/30, 0.15 Nm spring constant) with a tapping 
frequency of 32 kHz. Image sizes ranged from 300 nm to 20 pm. Scan 
rates were between land 2.525‘. Height, amplitude and phase imaging 
modes were employed. The captured images contained 256 scan lines 
at angles depending on the orientation of the monitored crystal””*®. 


The temperature in the fluid cell reached a steady value of 27.8 + 0.1 °C 
within 15 min of imaging”. This value was higher than room tempera- 
ture (around 22 °C) owing to heating by the AFM scanner and laser. 

6-Haematin crystals were grown on glass disks as described above. 
The density of attached haematin crystals was monitored under an 
optical microscope. We ensured similar crystal density for all samples to 
minimize potential depletion of inhibitors due to high crystal number. 
The glass slides were mounted on AFM sample disks (Ted Pella) and the 
samples were placed on the AFM scanner. Haematin solution in CBSO 
with aconcentration of 0.28 mM was prepared less than 2 hin advance. 
This solution was loaded into the AFM liquid cell using 1 ml disposable 
polypropylene syringes (Henck Sass Wolf), tolerant of organic solvents. 
After loading, the system was left standing for 10-20 min to thermally 
equilibrate. The crystal edges were identified to determine the orienta- 
tion and the crystallographic directions on the upward-facing (100) 
crystal surface. The crystals were kept in contact with the solution for 
0.5-1.5h to allow their surface features to adapt to growth conditions. 
We set the scan direction parallel to the [001] crystallographic direction 
and AFM images were collected for 3-5 h. The solution in the AFM fluid 
cell was refreshed every 30 min to maintain constant concentration. 
For studies of modifiers, growth solutions were replaced with ones 
containing a selected antimalarial inhibitor(s). With each modifier 
concentration, AFM images were collected for 2 to 4h, during which 
the solution was replenished several times. Solution without modifier 
was pumped into the AFM cell and the observed crystal was allowed to 
grow uninhibited for about 30 min before another modifier concentra- 
tion was introduced. 

The evolution of the haematin crystal surface was characterized 
by the velocity of growing steps v and the rate of two-dimensional 
nucleation of new crystal layers/,, To determine v, we monitored the 
displacements of 8-13 individual steps with a measured step height 
h=1.17 + 0.07 nm. Between 25 and 35 measurements were taken 
for each individual step and the average growth rates were reported. 
To determine/,,, the appearance of newislands onthe surface between 
successive images was monitored and the number of islands that grew 
was counted. This number was scaled with the imaged area and the time 
interval between images to yield/,,. From 15 to 25 measurements were 
averaged for each/,, determination. 

The goal of the AFM investigations was to establish the molecular 
mechanisms of synergy or antagonism between step pinners and kink 
blockers in inhibiting the growth of B-haematin crystals. Using AFM 
imaging at the mesoscopic scale, we demonstrate that step pinners 
and kink blockers cooperate in suppressing both the nucleation of 
new layers and the propagation of steps on haematin crystal surfaces. 
The nucleation of new layers at random locations onthe crystal surface 
requires observations at the mesoscopic length scale, within the range 
of capabilities of standard AFM techniques. Images with molecular 
resolution of growing steps would have provided additional insights. 
As shown in our previous work on haematin crystallization, imaging 
with resolution comparable to the size of the haematin molecule, 
around 1nm, is possible during in situ AFM monitoring of flat crystal 
planes”. The presence of steps, however, disrupts the contact between 
thescanning tip and the crystal surface and lowers the image resolution. 
Strict numerical correspondence between discrete molecular-level 
events and the mesoscopic and macroscopic variables that character- 
ize crystal growth and inhibition has been established in our earlier 
work?’ *°, This correspondence supports the molecular mechanisms 
based on observations at mesoscopic length scales. 


Determination of the surface free energy of the step edge yin 
the presence of MO and AQ 

We evaluate the value of y from the correlation between the radius of 
the two-dimensional nucleus of new layers R. and the supersaturation, 
similar to previous determinations in solutions without inhibitors 
carried out by Olafson etal.”. The critical radius R, for layer nucleation 


is defined as the threshold size above which an island has a higher 
probability to grow. Islands of size R < R.,;, are more likely to dissolve. 
We monitored the size evolution of all newly generated islands from 
time-resolved sequences of in situ AFM images and classified the islands 
as growing or dissolving. The largest sizes reached by dissolving islands 
and the threshold, above which all islands grew, were averaged to yield 
R.. We determined from 25 to 30 independent R, measurements at 
each combination of haematin and MQ or AQ concentration. Six con- 
centrations of haematin c,, were tested in the presence of 2.5 UMMQ 
and seven in the presence of 2.5 1M AQ. The, values obtained at each 
concentration of the two inhibitors were averaged and plotted asa 
function of the supersaturation Au =k, 71n(c,/c.), and were compared 
with the values of R, in the absence of inhibitors (Fig. 3b, c). 

The Gibbs-Thomson relation R, = Qy/Ay, where Q=0.708 nm’ is the 
molecular volume in the crystal, prescribes the values of y correspond- 
ing to each of the R,(Ay) correlations: 25 + 2 mJ min solution without 
inhibitors, 20 + 2 mJ min the presence of MQ and 22+1mJm” inthe 
presence of AQ. The standard deviations of the three y values arise 
from the regression analyses of the linear correlations R,(Ay) ‘and 
reveal that the confidence intervals of y at the three tested solution 
compositions partially overlap. 

We analysed the similarity between the three individual values of y by 
one-way analysis of variance, a statistical procedure that compares the 
variance between two groups to the variance within each group of data. 
We computed individual y values from each R, measurement and exam- 
ined the similarity between three pairs of y datasets: no inhibitor/AQ, 
no inhibitor/MQ and MQ/AQ. The analysis of variance test parameters 
are listed in Extended Data Table 2. The three F values, corresponding to 
the ratio of the variances within each pair of datasets, are significantly 
greater than the critical values for groups consisting of 195, 177 and 
297 independent measurements. The P values were of the order of 
10°, 10° and 10”, respectively, smaller than the significance level of 
0.05. These F and P values consonantly certify that the hypothesis of 
equality of the three y values is rejected. 


Inhibitor-inhibitor complexation 

The aim of these tests was to determine whether binary complexes 
between paired inhibitors form and reduce the inhibitor concen- 
trations. Spectroscopic characterization of solutions of the tested 
inhibitors revealed that the sum of the ultraviolet-visible absorbances 
of individual inhibitors is approximately identical to the absorbance 
of their combination. (Extended Data Fig. 3a—-d). Moreover, no shift in 
absorbance peaks was observed after mixing. These results suggest that 
itis unlikely that complexes form between two inhibitors. 


Inhibitor-haematin-inhibitor complexation 
Complexes formed between haematin and antimalarial inhibitors have 
been discussed by Egan and co-workers?°*! and the complexation con- 
stants between haematin and antimalarial inhibitors in CBSO have been 
reported by Olafson etal.”. Using established protocols, we tested for 
the complexation between haematin and four inhibitor pairs: QN/AQ, 
QN/MQ, CQ/AQ and CQ/MQ. The two tested inhibitors were dissolved 
at equal concentrations in CBSO and 2 ml of this stock was mixed toa 
final concentration determined by the lower inhibitor solubility. Fresh 
haematin stock was diluted with CBSO to aconcentration of 0.38 mM 
and then titrated with a solution of the inhibitor pair. At each titration 
step, the solution was gently stirred for 8-10 min to complete complexa- 
tion and a 350 ml aliquot was drawn for ultraviolet-visible spectrom- 
etry. The ultraviolet-visible adsorptions at 594 nm were measured for 
40 titration steps and rescaled to account for the dilution. The rescaled 
absorbance A,,,, was compared with a theoretical curve calculated from 
the complexation constants of the two tested inhibitors. 

The absorbance at around 594 nm displayed a clear shift to higher 
wavelengths after the addition of the inhibitor mixture, which indi- 
cates the formation of complexes. We calculated the theoretical A.o,,/Ao 


values (where A) is the absorbance of a pure haematin solution) for 
four different models for each combination and chose the best fit from 
the minimal mean squared deviation between experimental and theo- 
retical A.,,,/Ag Values. Non-zero deviations suggest the formation of 
complexes. The ultraviolet-visible spectra of solutions containing 
two inhibitors and haematin indicate that in all four combinations, 
even if new complexes exist, their concentration would be limited to 
a level that does not appreciably attenuate the concentration of anti- 
malarial inhibitors in solution (Extended Data Fig. 3e-i). Therefore, the 
sequestration of inhibitors due to the formation of ternary inhibitor— 
haematin-inhibitor complexes is unlikely to be significant. 


Kinetic Monte Carlo model of cooperativity between step 
pinners and kink blockers 

We employ astandard solid-on-solid kinetic Monte Carlo (kMC) model 
of crystal growth. We use a surface of a Kossel crystal consisting of 
N,= 50 by N,=100 sites occupied by N=5,000 surface molecules. In 
the kMC algorithm, a surface site is chosen at random and one of the 
possible kMC actions is performed on the basis of the probability of 
the various actions; N repetitions of this act comprise one KMC time 
step. Inthe absence of inhibitors, three actions are possible at a surface 
site: a molecule attaches to the site, the molecule occupying the site 
detaches or nothing happens (that is, the molecule remains fixed). The 
probability for attachment is dt x ve“/“8”), where dtis the kMC time 
step, vis the inverse kKMC timescale and is the chemical potential. The 
probability for a molecule to detach from site iis dt x vel where 
E, is the energy of the surface molecule at site i. The energy F; is evalu- 
ated as the sum of the bond energies of the molecule with its six near- 
est neighbours. Ina pure crystal, the bond energy is taken to be the 
same in all directions and is denoted €. By expressing temperatures in 
the dimensionless form k,7/e, the physical value of ¢ is not needed. 
Given that a molecule in the bulk crystal has six bonds with the energy 
shared between it and its neighbours, the binding energy in the bulk 
is 3e per molecule and so the equilibrium chemical potential is p1.qui=3¢. 

Inhibitors are handled intwo distinct ways. Static inhibitors function 
as step pinners. They are deposited on the surface at the beginning of 
asimulation and donot participate inthe KMC actions. When acrystal 
molecule is next toa static pinner, the bond energy between the twois 
taken to be zero. Thus, the only parameter needed to characterize the 
pinners is their surface density. As they donot contribute to the binding 
of molecules to the crystal, the pinners disrupt and impede the growth 
of surface layers. For conceptual simplicity, we arranged the pinners 
inasquare grid (Fig. 4c). If the pinners are too close together (that is, 
if their surface density is too high), the step velocity is zero and crystal 
growth is arrested. The physics of step blocking by such inhibitors, 
the criterion for step pinning, and a demonstration that inhibition is 
independent of the physical arrangement of the step blockers has been 
extensively discussed in Lutsko et al.” 

Anew feature of the present simulation work is the model of kink 
blockers. Similar to the solute molecules, the kink blockers are dynamic. 
Inthe presence of kink blockers, the pool of possible events at a crystal 
site is expanded to include their attachment and detachment. To block 
the kinks, the kink blockers must differ from the solute species and from 
the step pinners. We assume, for simplicity, that kink blockers do not 
bind to step pinners. We also assume that the kink blockers bind to the 
molecules in the crystal with a non-zero binding energy, otherwise, they 
would not exhibit a preference for kink sites. The kink blocker can only 
impede step growth if the bonding is weaker than the intermolecular 
bondsinthe crystal €. In contrast, weakly bound inhibitors would have 
alow residence time at the kinks and have little or no effect on step 
growth”. To reconcile these two requirements, we assume that the 
kink blockers bind anisotropically. We assume that the only non-zero 
bonds formed by kink blockers are to in-plane crystal molecules. Fur- 
thermore, we assume that the in-plane bond strengths are not equal. 
Two out of the four in-plane bonding directions are randomly assigned 
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bond strength 2¢ and the other two, 0.5e. The bond energy of asolute 
molecule deposited on top of a kink blocker is e. 

The total energy of a kink blocker surrounded by crystal molecules 
is 6, equal to the crystal molecules so that the incorporation of kink 
blockers does not change the energetics of crystal growth. However, the 
asymmetry of their binding to the crystal surface modifies the kinetics 
of step growth. A kink blocker attached to a kink site with orientation 
that promotes two bonds of total energy 4¢ will be bound stronger than 
asolute molecule bound with energy 3. Such kink blockers are unlikely 
to detach. In contrast, the bonds that this kink blocker molecule can 
form with the incoming solute molecules are weak and solute molecules 
that deposit next to it are more likely to detach than if deposited in 
a free kink. These dynamics impede step growth. A kink blocker 
attached toa kink in an unfavourable orientation, or adsorbed at a 
non-kink surface site, would have a total energy of 2.5¢ or less and will 
tend to detach. 

Our kMC model is subject to several constraints. First, the only 
model parameters that one can easily vary are the bond strengths inthe 
various directions. Second, a foreign molecule acts as a kink blocker 
if (1) it is attracted to kink sites, (2) it inhibits step growth and (3) it 
has a sufficient residency time to affect the step growth dynamics. 
These requirements inevitably lead to asymmetric lateral bonds 
with a total binding energy ina kink site equal to or greater than the 
energy of acrystal moleculeina kink site. Within these constraints, we 
do not expect our results to strongly depend on the numerical values 
chosen. 

Errors were estimated by averaging the step velocity over windows 
of 1,000 surface updates, thus producing a set of independent esti- 
mates of the velocity during the simulations. The arithmetic average 
of these values gives the overall estimate of the step velocity and the 
root-mean-squared deviation from the average of the averages is 
used to estimate its standard deviation. The error bars reported in 
the figures are the standard errors of the step velocities calculated as 
their standard deviations divided by the square root of the number 
of samples. 


Data availability 


The datasets generated during and/or analysed during the current study 
are available from the corresponding authors on reasonable request. 


Code availability 


The custom computer code used in these simulations is available upon 
reasonable request toJ.F.L. (jim@lutsko.com). 
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Extended Data Fig. 1| Effects of step pinners and kink blockers on bulk suppressing the length of B-haematin crystals. Opensymbols indicate the 
haematin crystallization. a, Scanning electron microscopy micrographs of concentrations of individual inhibitors that elicit a certain percentage of 
crystals grown inthe presence of inhibitors atthe concentrationslistedineach —_ inhibition, referred to as ICs. Dashed lines correspond to additive 
panel for 16 d at 23 °C. b,c, Variations of the average length-to-width, //w, cooperativity between the paired inhibitors for acertain percentage of 
aspect ratioA,, of crystals grown in the presence of increasing concentrations inhibition and are horizontal if the inhibitor in the abscissa is inactive when 
of CQ/MQ and CQ/AQ (b) and QN/MQ and QM/QA (c) at the displayed ratios applied alone. Solid symbols represent the concentrations of the paired 
relative to the A,, reached after growth in pure CBSO solutions for 16 d at 23 °C. inhibitors that evoke the same inhibition. Rightward shifts of the solid symbols 
Lines are guides for the eye. Variations of the corresponding average crystal fromthe respective dashed lines indicate antagonistic cooperativity. The 
length /and width ware displayed in Fig. 1f-i.d, Isobolograms characterizing corresponding combination index values are listed in Extended Data Table 1. 
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Extended Data Fig. 2 | Isobolograms characterizing the cooperativity of the 
CQ/MQ, CQ/AQ, QN/MQ and QN/AQ inhibitor pairs. a, b, Cooperativity in 
suppressing the step velocity v (a) and the rate of two-dimensional nucleation 
rate/,, of new layers (b). Open symbols indicate the concentrations of 
individual inhibitors that elicit a certain percentage of inhibition (ICs). Dashed 
lines correspond to additive cooperativity between the paired inhibitors fora 


certain percentage of inhibition and are horizontal if the inhibitor inthe 
abscissa is inactive when applied alone. Solid symbols represent the 
concentrations of the paired inhibitors that evoke the same inhibition. 
Rightward shifts of the solid symbols from the respective dashed lines indicate 
antagonistic cooperativity. The corresponding combination index values are 
listed in Extended Data Table1. 
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Extended Data Fig. 3| Lack of complexation between kink blockers and step 
pinners inthe solution. a-d, Lack of CQ/MQ (a), CQ/AQ (b), QN/MQ (ec) and 
QN/AQ (d) complexes. The ultraviolet-visible absorption spectra of the 
individual inhibitors and binary combinations indicated inthe plots. The 
spectra of the binary solutions are nearly identical to the sum of the spectra of 
the individual inhibitors. e-I, Lack of ternary compounds that include 
haematin and the CQ/MQ e, i), CQ/AQ (fF, j), QN/MQ (g, k) and QN/AQ (h, I) pairs 
of inhibitors. e-h, The ultraviolet-visible spectra of haematin at concentration 


Drug Concentration [ml] 


C= 0.38 mM in the presence of various combinations of QN, CQ, AQ and MQ (as 
indicated in the plots) at 1:1 molar ratios, where the inhibitor concentrations 
increase from top to bottom, as indicated by arrows. i-I, The relative decrease 
of the absorbance of a solution with initial c,, = 0.38 mM at 594 nmasa function 
of the concentration of the respective inhibitor pair (1:1 ratio) compared witha 
model assuming the presence of complexes of haematin with each of the 
individual inhibitors in the mixture, evaluated using the haematin-inhibitor 
binding constants from Olafsonetal.”. 
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Extended Data Fig. 4 | The correlation between the step velocity v The values of the Langmuir constant K,, determined from the slope of the 
and the inhibitor concentration. a—d, Data are presented in linearized straight lines are shown. The two leftmost data points for AQ, measured at 
coordinates v’,(v’)- vy ‘and Cp (Cg, kink blocker concentration) according to Cyg>7 UM, correspond to an unphysical increase inv at increasing 
Supplementary equation (7), for c,=[B] (a, c) and c, = [H,B] (b, d), respectively, concentration of AQ and were not considered in the regression analysis to 


for MQ (a,b) and AQ (c,d). Original data on the dependence of the step velocity determine K,,. 
onthe concentration of the kink blockers MQ and AQare from Olafsonetal.”. 
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Extended Data Fig. 5| The step velocity vin the presence step pinners and 
kink blockers, relative to that in pure solutions vy. Data calculated using 
Supplementary equation (22). The values of and K,, are listed in Extended 
Data Table 4. yy =25 mJ mis evaluated from the R, determinations in Fig. 3. 
K,p= 0.0027 1M “for CQ and 0.0013 uM" QNis evaluated from the v(c,) 
correlations for CQ and QN determined by Olafson et al.’* using Supplementary 
equations (14), (17) and (19). The surface area per adsorption site S, =1.12 nm? 
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from the structure of B-haematin crystals”. a, The correlation between the 
velocity ofa step with radius of curvature R, vp, and the concentrations of a 
step pinner (CQ or QN),c,, and kink blocker (MQ or AQ), c,, for the four listed 
inhibitor combinations. b, The step velocity vin the presence step pinners and 
kink blockers, relative to that in pure solutions Uo, at the listed constant ratios 
of kink blocker to step pinner, corresponding to experimental determinations 
in Fig. 2c, e, compared with vin the presence of the listed step pinners only. 
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Extended Data Fig. 6 | The regions of antagonistic and synergistic 
cooperativity in the plane of the concentrations of step pinnersc, and kink 
blockers c,. Solid line represents the equation (dvp/0cg),,,., = 0, where 

(0p /0Cg),,,c, Follows Supplementary equation (28). This line corresponds 

to additive cooperativity and divides the (cp, c,) plane into fields where 

(0p /0Cg),,,c, < 0 Marks that step pinners and kink blockers cooperate 
synergistically, and (vg/0cg),, ¢, > O indicates antagonistic cooperativity 
between the two inhibitors. 


Extended Data Table 1| The combination index for the four listed step pinner/kink blocker pairs 


IC CQ/M CQ/A N/M N/A! 
Crystal Length 
10 0.223 1.628 0.232 0.359 
20 0.217 2.548 0.230 0.419 
30 0.215 2.545 0.227 0.460 
40 0.698 1.560 0.297 0.232 
50 1.089 1.266 0.318 0.240 
Step Velocity 
10 1.260 1.263 2.200 0.875 
20 1.251 1.917 2.256 1.107 
30 1.164 1.783 1.984 0.968 
40 1.208 1.325 1.962 0.920 
50 1.039 1.099 0.975 
Nucleation Rate 
10 0.460 0.372 0.438 0.470 
20 0.995 1.033 0.726 0.843 
30 1.403 1.608 1.072 1.153 
40 1.603 1.983 1.306 1.285 
50 U731 2.189 1.525 1.394 
<0.7 0.7-0.85 0.85-0.9 0.9 -1.1 1.1-1.2 12-14 mH 
Sneneibin Moderate Slight Nearly Slight Moderate Aurcadtiem 


Synergism Synergism Additive Antagonism Antagonism 


The combination index was calculated for the inhibition of crystal length in bulk crystallization experiments (corresponding to isobolograms in Extended Data Fig. 1d), the step velocity 
(isobolograms in Extended Data Fig. 2a) and the two-dimensional nucleation rate of new crystal layers (isobolograms in Extended Data Fig. 2b). A classification of combination index values as 
synergy, additivity and antagonism is provided at the bottom. 
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Extended Data Table 2 | The analysis of variance test parameters 


Source of Sum of Degrees of Mean 
Variation Squares freedom squares F P-value _ F critical 


No inhibitor/AQ 


Between 


Groups 100.7 1 100.7 9.43 0.00247 3.89 
Within 
Groups 1858.3 194 10.6 

Confidence 


Total 1959.1 195 interval 95% 


No inhibitor/MQ 


Between 


Groups 397,2 1 327.2 23.15 3.50E-06 3.90 
Within 
Groups 2204.9 176 14.1 
Confidence 
Total 2532.2 177 interval 95% 
MQ/AQ 
Between 
Groups 300.7 1 300.7 25.3 8.33E-07 3.87 
Within 
Groups 3581.9 296 12.1 
Confidence 


Total 3888.6 207 interval 95% 


The parameters were used to test the distinction between the values of the surface free energy y in haematin solution in the absence of inhibitors and in the presence of AQ or MQ. 


Extended Data Table 3 | Concentrations of free haematin [H], free inhibitors [D] and kink blocker-haematin complexes [H.B] 


Cu, Cp, [H], [B], [H2B], [B}l*, [H2Bl*, wv" », Vo(vVo — v)* 
mM mM “1 “1 nm st 

MQ 

0.28 2.0 0.278 0.948 1.025 1.054 0.976 0.983 0.654 2.990 
0.28 3.0 0.277 1.486 1.594 0.673 0.627 0.973 0.673 3.239 
0.28 4.0 0.276 1.936 2.063 0.516 0.485 0.966 0.579 2.500 
0.28 9.0 0.271 4.501 4.619 0.222 0.216 0.923 0.500 2.182 
0.28 14.0 0.266 7.106 7.034 0.140 0.142 0.883 0.470 2.1403 
AQ 

0.28 0.66 0.279 0.016 0.65 60.5 1.534 0.989 0.776 4.637 
0.28 1.2 0.278 0.029 1.15 34.0 0.864 0.981 0.711 3.632 
0.28 1.5 0.277 0.038 1.48 26.0 0.672 0.975 0.679 3.298 
0.28 2.0 0.276 0.051 2.00 19.3 0.498 0.967 0.623 2.814 
0.28 4.1 0.272 0.106 4.01 9.41 0.249 0.934 0.583 2.663 
0.28 Fit 0.266 0.190 6.95 5.21 0.144 0.885 0.584 2.948 
0.28 15.0 0.251 0.457 14.6 2.19 0.068 0.755 0.635 6.302 


Concentrations modified by inhibitor-haematin complexation were evaluated at analytical concentrations of haematin c,, and inhibitor C, using complexation constants 14 and 510 mM” for MQ 
and AQ, respectively’’. Evaluation of v'g lowered from the step velocity in the absence of inhibitors v) owing to the decrease of haematin concentration from c,, to [H]. The variables v'o(v'g — v)! 
and c," of the linearized form of the correlation between v and c,, Supplementary equation (7), for c, = [B] and c, = [H,B], respectively. 
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Extended Data Table 4| The Langmuir constant for adsorption of MQ and AQ at kinks 


Assuming Assuming the 
unliganded MQ complexes H2MQ 
and AQ are the and H2AQ are the 
active inhibitors active inhibitors 
MQ AQ MQ AQ 
Kzzp, uM 1.9 50 1.64 1.74 
e 0.54 0.4 0.54 0.4 


[B] or [H,B] atCg=2.5uM 1.2 0.06 1:5 2.45 
EK, pCp 1.29 1.09 1.15 17 
€In(1 + K,2cz) 0.64 0.55 0.62 0.67 


The Langmuir constant K,, and the limiting fraction of occupied kinks € determined from the linear plots in Extended Data Fig. 4 assuming that unliganded MQ and AQ are the active inhibitors 
and, alternatively, that the complexes H.MQ and H,AQ are the active inhibitors. Evaluation of [MQ] and [AQ] or [H,MQ] and [H,AQ] at c, = 2.5 uM, at which the inhibitor effects on the surface free 
energy of the step edge y were measured, and of the factors EK, ,[B], Eln(1 + K,,[B]), &K,~[H2B] and Eln(1 + K,,[H,B]) used in the evaluation of Ay in the presence of an inhibitor. 
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In conventional intercalation cathodes, alkali metal ions can move in and out of a 
layered material with the charge being compensated for by reversible reduction and 
oxidation of the transition metal ions. If the cathode material used ina lithium-ion or 
sodium-ion battery is alkali-rich, this can increase the battery’s energy density by 
storing charge on the oxide and the transition metal ions, rather than on the transition 
metal alone’. There is a high voltage associated with oxidation of O* during the first 


charge, but this is not recovered on discharge, resulting in reduced energy density”. 
Displacement of transition metal ions into the alkali metal layers has been proposed 
to explain the first-cycle voltage loss (hysteresis) "°. By comparing two closely 
related intercalation cathodes, Nao 7s[Lig.»;MNp 75]0, and Nag [Lig 2Mno..]O>, here we 
show that the first-cycle voltage hysteresis is determined by the superstructure in the 
cathode, specifically the local ordering of lithium and transition metal ions inthe 
transition metal layers. The honeycomb superstructure of Na, 75[Lip.s;Mng75]0>, 
present in almost all oxygen-redox compounds, is lost on charging, driven in part by 
formation of molecular O, inside the solid. The O, molecules are cleaved on discharge, 
reforming O7 , but the manganese ions have migrated within the plane, changing the 
coordination around O7 and lowering the voltage on discharge. The ribbon 
superstructure in Nao [Lio 2Mno 10, inhibits manganese disorder and hence O, 
formation, suppressing hysteresis and promoting stable electron holes on O” that are 
revealed by X-ray absorption spectroscopy. The results show that voltage hysteresis 
can be avoided in oxygen-redox cathodes by forming materials with a ribbon 
superstructure in the transition metal layers that suppresses migration of the 


transition metal. 


During the first charge-discharge cycle, the cathode material 
Nao 7sLLio.2;Mno75]02 exhibits voltage loss, in clear contrast to 
Nao 6lLio..2Mnos]O, which does not, despite their very similar composi- 
tions (Fig. 1a, c). Both materials possess the P2-type structure (Extended 
Data Fig. 1), composed of Na’ ions in trigonal prismatic (P) coordina- 
tion and with twotransition metal (TM) oxide, TMO,, slabs required to 
describe the repeat stacking sequence (Fig. 1b). However, they exhibit 
different superstructures—specifically, different ordering of the Li 
and Mninthe TM layer (Fig. 1d, e). Nag 7s[Lig »;Mng 75]0, has honeycomb 
ordering, as observed in the majority of O-redox materials, whereas 
Nao 6LLio.2Mng]O, has a different ordering, composed of ribbons of 
Mn (Extended Data Fig. 2). 

Confirmation that both materials are dominated by O-redox was 
obtained by operando electrochemical mass spectrometry (OEMS) and 
Mn L-edge X-ray absorption spectroscopy (XAS) along with resonant 
inelastic X-ray scattering (RIXS). Inthe case of Nay 75[Lig..;Mno75]0,, the 
data demonstrating that this is an O-redox compound are reported 


elsewhere; no O-loss was observed”. Similarly, for Nag ¢[Lig.Mng]O,,no 
evidence of O-loss is observed as seen by OEMS (Extended Data Fig. 3). 
XAS and RIXS identifies electron holes on O (Extended Data Fig. 4). 


Honeycomb superstructure lost, ribbon retained 


Powder X-ray diffraction (PXRD) data for Nao¢[Lio2Mno.]O, are pre- 
sented in Fig. 2b. At the end of charge, the diffraction peaks belonging 
tothe P2 phase have reduced in intensity. New peaks, notably the broad 
peak at 16.5° in 20 and peaks at 37° and 66°, have appeared. These peaks 
correspond to the most prominent peaks indexed on an O2 structure 
(002, 101and 110 peaks, respectively). Similar changes in the PXRD have 
been observed for other charged P2-type Na[TM]O, compounds!*, 
Upon sufficient desodiation, the TMO, slabs glide along a number of 
unique crystallographic vectors, changing the coordination environ- 
ment of ions in the alkali metal (AM) layers from trigonal prismatic 
(P) to octahedral (O), with reduced interlayer spacing. These phases 
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Fig. 1| Electrochemistry and structure of honeycomb: and ribbon-ordered 
cathode materials. a, c, First-cycle voltage curves for (a) Nao 7s[Lio.2s:Mno 75]02 
and (c) Nag [Lio ».Mno s]O3. As discussed in the text, the dominant species 
extracted on chargeis Na’, not Li’. Discharge limit, 2.0 V; charge limit, 4.5 V; rate 
10 mAg”.b, Structural model of P2-type Na[TM]O, with no in-plane ordering; 
space group P6,/mmc. Oxide layers stack in ABBA sequence giving AM layers of 
trigonal prismatic Na* sandwiched between TM layers of octahedral Mn 
partially substituted with Li. d, Nao 7s[Lio.2sMno75]02 possesses the P2 structure 
with the well-known honeycomb superstructure ordering of Liand Mn within 
the TM layer; space group P6,. Two-atom dumbbells (Mn-Mn) along the [010] 
direction, which are characteristic of the honeycomb superstructure, are seen 


commonly exhibit broad diffraction peaks due to the existence of stack- 
ing faults'**!”*, The two-phase P2 to O2 transition is consistent with 
the plateau in the electrochemistry observed in Fig. 1c. The changes 
inthe PXRD observed on charging are reversed on discharge, with the 
crystalline P2-phase being recovered at the end of discharge. 

Areduction in peak intensity and peak broadening is also observed 
in Nag 7s[Lig.;Mno7;]0, upon charging (Fig. 2a), as is evidence of anew 
broad peak at around 18° characteristic of O-type layer stacking with 
acontracted layer spacing. It is clear, however, that the O-type phase 
here is not well crystallized as it does not exhibit sharp, well-defined 
PXRD peaks. The charging plateau for Nay 7s[Lip .;Mno75]0, exhibits a 
gentle slope, consistent with the P2-O2 transition in this case occur- 
ring through a continuously evolving intergrowth of O stacking faults 
inthe P structure. 

Nuclear magnetic resonance data (°Li NMR, discussed below) are 
sensitive to all of the Li whether in crystalline or non-crystalline regions, 
and thus NMR is the best technique to follow changes in Li. For both 
materials, NMR revealed substantial displacement of Li* fromthe TM 
layer into sites of octahedral coordination in the AM layer upon deso- 
diation, further confirming the presence of O-type stacking faults to 
accommodate the Li* ions. We therefore conclude that transforma- 
tion from P2 to O-type stacking is near-complete in both materials at 
high states of charge. The PXRD, being sensitive to crystalline regions, 
does not show so clearly the evolution of O2, especially in the case 
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with ADF-STEM. Note that the honeycomb superstructure dominates although 
the composition is not that of the ideal honeycomb (Li,,;Mn2,/3). Colour-coding 
ininsets isthe sameas the other structural figures ind and e: purple, Mn; blue, 
Li; red, O, green, Na. e, Nao 6[Lio. 2Mng..]0O, possesses the same P2 structure but 
with the superstructure ordering of Liand Mn within the TM layer instead 
forming ribbons. This superstructure ordering gives rise to unique diffraction 
peaks inthe PXRD pattern (highlighted blue) and can be fully indexed to the 
P2,/c space group; see Extended Data Fig. 2. It also gives rise to four-atom 
dumbbells (Mn—Mn-Mn-Mn) when viewed along the [010] direction, as 
observed in the ADF-STEM images. 


of Nag 7s[Lig2s;MNo75]0,, because the O2 phase is disordered, giving 
rise to fewer diffraction peaks than Nay,[Lio,.Mnos]O>. Interestingly, 
the diffraction peaks arising from in-plane ordering, as highlighted 
in Fig. 1, are recovered on discharge for Nao ¢[Lig ;Mno]O, but not for 
NaozslLio.2sMNo75]O>. 

At the end of charge, the sharp line observed in the °Li magic- 
angle-spinning (MAS) NMR spectrum for Liin the TM layer of pristine 
Nao 6lLio2Mnos]O, disappears, and instead anew Li’ environment with 
different frequency (centred at 720 ppm, shaded green) appears inits 
place (Fig. 2f, middle panel). This new shift is in close alignment with 
those observed for Li* in octahedral coordination within AM layers in 
other layered compounds”. It is accordingly assigned to Li residing 
inthe AM layer. After discharge (Fig. 2f, lower panel), the new shift 
disappears and the original shift is reformed with the same, sharp, line 
shape, indicating the presence of Liagain dominantly in their original 
octahedral TM layer sites. Further NMR data were collected at inter- 
mediate states of charge (Extended Data Fig. 5), which confirm that 
this process occurs via a two-phase mechanism. 

The °Li NMR results for Nag 75[Lip >;Mno75]0,, also show that Li’ is dis- 
placed from the TM layers to octahedral sites in the AM layers on charge; 
an ensemble of shifts centred at 600 ppm is observed inthe spectrum 
for Nao 7sLLio.2~Mny 7,10, on charge (Fig. 2e, middle panel). However, on 
discharge, as the lithium returns to vacancies inthe TM layer, a consider- 
able broadening of the ensemble of isotropic chemical shifts centred 
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Fig. 2| Evidence for the loss of honeycomb ordering and retention of ribbon 


ordering onthe first cycle. a,b, PXRD data for honeycomb-ordered P2- 


Nao 7sLLip.25;Mno75]0, and ribbon-ordered P2-Nap 6[Lig.Mno s]O,, respectively: 
pristine; after charging to 4.5 V; and after discharging to 2 V. Insets highlight 
the superstructure peak region. Superstructure peaks corresponding to the 


ribbon ordering reappear on discharge, indicating retention of in-plane 


ordering, whereas for the honeycomb the superstructure peak at 22° is lost 
irreversibly. c,d, STEM-ADF images. Projections are parallel tothe ab plane 
along the [010] zone-axis. Light and dark contrast corresponds to the heavier 


(Mn) and lighter (Li, Naand O) scattering respectively. The ribbon 
superstructure is retained in the case of Nao [Lip 2Mno,s]0, on charge and 


discharge, although there is some evidence of disorder in the discharged state, 
which is discussed in the text. In contrast, the two-atom (Mn-Mn) dumbbells 
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within the TM layers almost completely disappear, showing that honeycomb 
ordering is lost. There is some evidence of scattering between the TM layersin 
the charged honeycomb structure, which might indicate some displacement of 
Mninto the AM layers, but the dominant Mn disorder is in-plane. e, f, °Li MAS 
NMR data. Peaks corresponding to the isotropic shifts for Liin the TM layers, 
Li;y, are shown in purple, and those corresponding to Liin AM layers, Li,y, are 
shownin green. All other peaks are spinning sidebands. For ribbon-ordered 
Nao6lLio.2Mno,g]O3, lithium migrates from TM to AM layers on charging, and this 
is reversed on discharge. For honeycomb-ordered Nao 75[Lip 5;MNg 75]0>, lithium 
again migrates from TM to AM layers on charging; however, on discharge, Li 
repopulates a TM layer with different local ordering as aresult of Mn migration, 
labelled Li;,,*. Small amounts of Li,.MnO, and Li-containing diamagnetic 
impurities are shaded in black and red, respectively. 
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Fig. 3 | Spectroscopic evidence for O, formation and stable electron holes on 
O*.a,c, Oxygen K-edge XAS and high-resolution RIXS spectrarecorded atan 
excitation energy of 531 eV for (a) honeycomb-ordered Nao 7s[Lio.2sMno75]0.and 
(c) ribbon-ordered Nay [Lig 2Mno g]O,in the pristine, charged (4.5 V), and 
discharged (2 V) states. The red highlighted pre-edge feature at 531eV and RIXS 
features Aand Bare characteristic of O-redox. b, The high-resolution RIXS 
spectrum for molecular O, at 530.3 eV (reproduced with permission from 
Arhammar etal.”8). d, With high-resolution RIXS, feature B inais resolved intoa 


around 1,800 ppm is observed (Fig. 2e, lower panel). This is indica- 
tive of a range of new Li environments generated from different local 
ordering of Mn. The results are in contrast to the ribbon superstructure 
ordering of Nao 6[ Lip 2Mng]O,, which does not change and therefore 
provides the same sites for Li* as were present in the pristine material: 
that is, TM ordering is largely retained in Nag ,LLip.Mnos]O, but notin 
Nao 7sLLiop.2s:Mno75]0,. This is evidence that Mn is mobile in honeycomb- 
ordered Nag 75[ Lig ;{Mno 7510. Integration of the Li signals for both mate- 
rials shows that any loss of Li from the structure is <1% (that is, below 
the limit of detection) and all of the Li that is displaced returns to the 
TMlayers. This also confirms that Na’, not Li’, is the dominant species 
removed and reinserted into the structure on charge/discharge. 
Annular dark-field scanning transmission electron microscopy (ADF- 
STEM) data indicate that the honeycomb superstructure of the pristine 
Nao 7sLLio.2;Mno 7510, is almost entirely lost after charging to 4.5 V. Viewed 
along the [010] direction (Fig. 2c), the two-atom (Mn-Mn) dumbbells 
are less clearly resolved in most parts of the image after charging to 
4.5V, indicating loss of the in-plane order. After discharge to 2 V, there 
is virtually no periodic variation in contrast along the layers, showing 
the honeycomb is completely lost. In contrast to this, the ADF-STEM 
image of Nap .[Lio.Mn,s]O, along the [010] direction shows retention 
of the four-atom (Mn—Mn-Mn-Mn) configuration associated with 
the ribbon ordering described in Fig. le, with some slight disorder 
evident at the end of discharge. These results are in agreement with 
the PXRD and NMR data, which show predominantly O-type stacking 
in both cases. Further images from different regions of each cycled 
sample are included in Extended Data Fig. 6, showing the structural 
changes more comprehensively. It is important to note that PXRD and 
NMR, unlike STEM, sample the whole material, so the STEM results are 
representative of the material as a whole; the consistency between all 
three techniques reinforces the interpretation of the results. 
Together the NMR, PXRD and STEM data show that the honeycomb 
superstructure is unstable on charging. The Li* ions are displaced to the 
AM layers, and irreversible in-plane migration of Mn results ina more 
disordered arrangement of Mn and vacancies in the TM layer in the 
charged state of Nag 75[Lip.23;Mno75]02. The honeycomb ordering is not 


progression of energy-loss peaks, arising from the vibrations of the O-O bond 
witha fundamental vibrational frequency, v, of approximately 1,600 cm™ 
matching that of molecular O, and that expected from the 1.2-AO-O bond in 
the Mn-n}-0, species predicted from DFT. e, Literature values for the bond 
lengths and frequencies of O-O dimers for comparison”. The O K-edge XAS 
spectrum for ribbon-ordered Nao ¢[Lip 2Mno,s]O2 in the charged state shows the 
formation of stable electron holes (h*) on O* (green) at low energy (high 
voltage). 


recovered on discharge, with the consequence that the Li’ ions returnto 
different sites in the TM layer. In contrast, for the ribbon superstructure 
in Nao gl Lip. 2Mnos]O,, the ordering remains predominantly unchanged, 
and the high voltage on charge is retained on discharge. 


Molecular O, or stable e holes 


Density functional theory (DFT) calculations were performed on struc- 
tural models of the charged state, O2-NaoLig.;Mno 7502, with Li* in the 
AM layer, vacancies in the TM layer and different in-plane configura- 
tions of Mn. As shown in Extended Data Fig. 7a, many configurations 
are very similar in energy to the honeycomb arrangement (within 
about 20-30 meV per formula unit (f.u.), comparable to the thermal 
energy, kT=25.7 meV) with one notable exception that was substantially 
(225 meV per f.u.) lower in energy. In this arrangement, TM vacancies 
cluster together, resulting in replacement of the oxygens coordinated 
by two Mn (O-Mn,) which occurs for all O in the honeycomb struc- 
ture, with O coordinated by three Mn (O-Mn,) and O atoms completely 
decoordinated from Mn, which dimerize with one other O bonded to 
one Mn. The O-O bond length is 1.2 A (directly comparable to that of 
molecular O,, 1.208 A), and the Mn-O distance is 2.2 A, consistent with 
aweak Mn-O bondand hence formation of aMn-n-O, moiety (where 
indicates the hapticity) containing molecular O,. The dimerization of 
Oto form Mn-n!-O, lowers the overall energy of the charged structure 
and drives the TM migration. 

To evaluate what impact this structural change has on the discharge 
voltage, we calculated this quantity directly using the computed ener- 
gies of the charged and discharged structures. To model the discharged 
state, the Mn and vacancy arrangement corresponding to the deep 
energy minimum described above was retained and the AM- and TM- 
layer vacancy cluster repopulated with Na’ and Li* respectively. The 
resulting relaxed structure no longer possessed the short O-O distance 
(now 2.6 A); the O-O bond of molecular O, is cleaved on discharge 
and fully reduced O” formed (Extended Data Fig. 7c). A value of 3.2 V 
was obtained from the calculation of the discharge voltage, in good 
agreement with that observed from the electrochemistry (Fig. 1a), 
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Fig. 4| Electronic and structural changes accompanying O redox. a, Density 
of states (DoS) plots from DFT for honeycomb-ordered Nao7s[Lig.2;Mno75]0, at 
different states of charge. Pristine state: honeycomb-ordering with frontier 
Na*-O 2p-Li* oxygen states that are oxidized on charge. Charged state: Li sites 
inthe TM layer are vacant, in-plane migration of Mn (along arrows) forms 
vacancy clusters triggered by formation of molecular O, (1.2 A). Electron holes 
localize in molecular m* spin-down states of O,, represented by red shadingin 
the DoS. Discharged structure: Electrons populate the antibonding m* and o* 
states, breaking the O-O bond and forming two O” (2.6 A apart). These oxide 


supporting the structural models obtained from DFT. We considereda 
range of other Mn-disordered structures, and all were higher in energy 
than the vacancy cluster reported here. Also, the other disordered con- 
figurations could not produce a voltage as close to the experimentally 
observed voltage. Similar voltage calculations were carried out for the 
ribbon structure, Nao 6[Lig 2Mno.s]O2, where, assuming that no change 
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ions are surrounded by alkaliions inthe structure and thus are at the top of the 
valence band (purplein DoS). b, Layer stacking at different states of charge for 
both Nag 7s[Lip 2;Mno 7510, and Nao gLLip ;Mno s]O2.¢, DoS plots from DFT for 
ribbon-ordered Nap [Lig 2MNno g]O>. Pristine state: there are two unique O 
environments, O-Mn, (blue) and O-Mn, (green). Charged state: electron holes 
localize on O-Mn, close to the bottom of the conduction band (blue in DoS). 
Discharged state: ribbon ordering maintained, structure unchanged from 
pristine. In all DoS plots, the Fermi energy level is set to zero. 


in in-plane ordering occurs, a value of 4.1 V is obtained in close agree- 
ment with the observed discharge plateau. 

To explore experimentally the nature of the oxidized oxygen species 
on charge, we used RIXS spectroscopy of a higher resolution than in 
past studies*””"**”, Our data reveal the underlying fine structure of 
the elastic peak, labelled B in Fig. 3a, and show that it is composed of 
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Fig. 5| Dependence of O-redox stability on superstructure. a—c, In-plane Mn 
migrations (arrows) required to form O, molecules (orange ellipses) inthe TM 
layer of charged (a) honeycomb, (b) ribbon and (c) mesh arrangements. More 
Mn migrations are required to form O, inthe ribbon and mesh structures, and 
Mn must migrate into sites already filled by Mn, making O, formation less likely. 
TM layer vacancies are represented by 


a progression of energy-loss peaks associated with the vibrations of 
the O-O bond and witha fundamental vibrational frequency of about 
1,600 cm" (Fig. 3d), closely matching that of molecular O, (Fig. 3b). 
Together with the broad inelastic peak, A, these features bear strong 
resemblance to the RIXS spectrum for gaseous molecular O, (Fig. 3b)”*. 
The ultra-high-vacuum conditions under which RIXS measurements 
are made ensures that the electrode materials are fully ‘out-gassed’, 
and hence O, molecules in the gas phase or surface adsorbed cannot 
account for the RIXS observations, supporting O, bound with n' coor- 
dination to Mn. Furthermore, both spectroscopic features A and Bno 
longer appear inthe discharged sample, indicating that the O, species 
is reduced on discharge. 

Oxidation of O7 results in oxygen species with an electronic driv- 
ing force for O dimerization. This is reflected in the observation of 
molecular O, here and reports of peroxo-like (O,”") with O-O bond 
lengths of 1.4-2.5 A in 4d- and 5d-based materials**””°. Here we see 
clear evidence that molecular O, (O-O 1.2 A) forms and not 0,2 (O-O 
1.5 A) or peroxo-like species (O-O 1.9-2.5 A). The results show that 
molecular O, can be observed in the bulk of O-redox materials and 
demonstrate its important role in O-redox, in particular in voltage 
hysteresis. This bulk O, is trapped in the vacancy clusters and has no 
mechanism of diffusing through the material to the surface; however, 
any O, that is formed at the surface can escape. Although no direct O, 
loss is seen for either Nap .[Lip2Mn lO, or Nao 7s[Lig..s;Mng 75]0,, Some 
CO, is observed in both cases across the charging plateaux. It has been 
shown that singlet O, is typically evolved from O-redox materials and 
reacts with the electrolyte, forming CO, (refs. *”**). IfO, release is forced 
to be fast, by for example stepping toa high potential, asmall amount 
of O, can be detected”. 

Asmall signature from molecular O, is also seen in the RIXS for rib- 
bon-ordered Nao .[Li,.MnoglO, at the end of charge. This is in accord 
with the electrochemical data and ADF-STEM images, which show that 
the ribbon structure of Nay .[Lig,Mng.]0,is not completely preserved 
during the first cycle, and some low-voltage capacity is seen on the 
first discharge. However, turning to the O K-edge XAS data presented 
in Fig. 3c, in addition to the O, feature appearing at 531 eV, there is also 
anew feature appearing before the pre-edge at 527.5 eV. This featureis 
exactly where electron-hole states lying just above the Fermi energy 
would be expected to appear and hence represents electronic states on 
Othat can be reduced at high potential (that is, O-redox without voltage 
hysteresis). This is evidence of true, stable electron holes on O (thatis, 
O” wheren<2) andis distinct from the localized holes that form on0O,. 


Superstructure controls voltage hysteresis 


Hysteresis in O-redox materials has been related to the number of elec- 
tron holes formed on oxygen, with too many resulting in structural 
instability and voltage loss*’. However, there are anumber of materials 
that are less oxidized than ribbon Nap [Lip .Mno]0, (0.2 electron holes, 
h*, per O) yet still exhibit hysteresis, such as Na,RuO, (0.13 h* per O) and 
Li[Ni,3Li,.Mns,.]O, (0.13 h* per O)°. The crucial difference here is that 
the latter examples both possess honeycomb-ordered TM layers, like 
Nao 7sLLio.2;Mno75]0,, whereas Nay [Lip ,Mn s]O, exhibits ribbon order- 
ing. Further examination of the literature reveals a strong evidential link 
between superstructure ordering and voltage hysteresis which extends 
across P2 and P3 Na-ion compounds and Li-rich O3 structures. Hon- 
eycomb ordering is exhibited by the vast majority of layered O-redox 
cathodes which consistently exhibit voltage hysteresis, and the only 
known examples without voltage hysteresis have a different ordering 
scheme. P3-Nao¢[Lio2Mnos]O, has the same ribbon ordering scheme 
as P2-Nao [Lip .Mn,s]O,, but with a different stacking sequence, as we 
show in Extended Data Fig. 8, and does not exhibit voltage hysteresis. 
Na,Mn,0,, which has a unique in-plane ordering scheme corresponding 
to its [17Mn,,7] TM layer composition (where represents a vacancy), 
also shows no voltage hysteresis*>**, 

The highest-energy O 2p states in pristine honeycomb 
Nao7sLLip.25Mno7;]0, and ribbon Na, [Lip ,Mn, lO, are those coordi- 
nated by two ionic cations (Li*, Na*) from the TM and AM layer respec- 
tively, forming Na*-O 2p-Li* dumbells**. On charging, electrons are 
removed from these states, oxidizing O* to form O” (n<2) and trigger- 
ing displacement of Li* from the TM to AM layers (Fig. 4b). For perfect 
honeycomb ordering, all oxide ions are coordinated in the TM layer by 
two Mn, O-Mn,, are degenerate in energy and are equally susceptible 
to oxidation on charging. However, this degeneracy can be broken 
through Mn migration which changes the Mn coordination of the O” 
ions from O-Mn, to: more coordinated O-Mn,, less coordinated O-Mn, 
and uncoordinated O (O-Mn,) (Fig. 4a). The unbonded O (O-Mn,) is sta- 
bilized by dimerizing with the O-Mn, forming the Mn-n/-O, moieties, as 
demonstrated by DFT and RIXS above. It is this electronic driving force 
that promotes Mn migration to disrupt the honeycomb superstructure. 
Discharge involves reduction of the unoccupied states on O,, trigger- 
ing cleavage of the O-O bond, the formation of fully reduced O07 and 
the return of Li* to the TM layers. However, now they return to differ- 
ent sites, instead occupying sites in the vacancy cluster. The Na* ions 
return to the AM layers. The discharge voltage for this process (3.2 V, 
as noted above) is much lower than the voltage on charge, explaining 
the first-cycle voltage hysteresis of the honeycomb superstructured 
O-redox cathode. In the case of the ribbon superstructure, Mn migra- 
tion is suppressed, preventing O, formation and stabilizing electron 
holes on O* (Figs. 3c, 4c). Inthe charged honeycomb structure, only 
two Mnare required to migrate into adjacent vacancies to generate free 
O” that can pair with aneighbouring O to form O, (Fig. 5a). Incontrast, 
because vacancies are more dispersed inthe ribbon structure, multiple 
Mn displacements, including sequential Mn hops, would be required to 
form the TM vacancy clusters (Fig. 5b). Ribbon ordering thus provides 
increased stability for high-voltage O redox (4.1V from calculation) by 
preserving the degeneracy of the O 2p states. 

Ribbon ordering is not, however, 100% stable. Even on the first 
cycle not all the charge capacity at 4.3 V is recovered on discharge. 
Furthermore, Extended Data Fig. 9 shows that dwelling for increas- 
ing time in the highly desodiated charged state promotes the loss 
of the high-voltage discharge, suggesting increased Mn migration. 
Upon extended cycling, the discharge plateaux gradually decrease in 
length, with greater evidence of low-voltage capacity similar to that 
of Nag 7s5[Lip.2;Mng75]0,. PXRD data (Extended Data Fig. 10) confirm 
that after 10 cycles the diffraction peaks arising from the superstruc- 
ture ordering are reduced, and there is increasing strain broadening 
within the a—b plane indicative of Mn migration and loss of the ribbon 
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ordering. ADF-STEM imaging (Extended Data Fig. 6) also shows loss of 
ribbon ordering after 10 cycles. The loss of voltage correlates with the 
loss of superstructure, further reinforcing the relationship between 
the two. This irreversibility of the high-voltage plateau is also seen for 
the P3-type analogue of P2-Nag [Lip 2Mng gO, (refs. *), which exhibits 
the same ribbon superstructure ordering. 

Although ribbon ordering is not completely stable, acompound 
possessing an ordering scheme with even more dispersed vacancy 
ordering has been reported that shows higher reversibility of the high- 
voltage O-redox plateau (Fig. 5c)**. This observation, underpinned by 
our work revealing the critical role of superstructure in preserving 
high-voltage O-redox, defines a compelling strategy in the search for 
high-energy-density Li-rich cathodes. 
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Methods 


Synthesis 

Nao ¢LLio2Mnos]O, and Nao -7s[Lip.;Mno7,]O, were prepared via solid- 
state reaction between stoichiometric amounts of Na,CO, (299.0%, 
Aldrich), Li,CO, (299.0%, Aldrich) and MnO, (299.0%, Aldrich). The 
precursors were ball-milled for 1h using a Retsch PM100, pressed into 
pellets and calcined at 800 °C for 12 h under flowing oxygen. Heating 
and cooling was conducted under a controlled rate of 10 °C mintin 
both cases except for Nap ¢[Lig,.Mno.]0, which was cooled at 2 °C min”. 
As-prepared materials were transferred into an inert Ar atmosphere 
without exposure to air and stored for characterization. All subsequent 
procedures were carried out without exposure to air. 


Structural characterization 

X-ray powder diffraction patterns were collected using a Cu source 
Rigaku diffractometer. Neutron powder diffraction patterns were 
collected at the POLARIS diffractometer at the ISIS neutron source. 
Powders were loaded into vanadium canisters and sealed under inert 
atmosphere for measurement. Reitveld profile refinements were per- 
formed using the GSAS suite of programs. 


Electrochemical characterization 

Electrodes were prepared by mixing 80 wt% active material, 10 wt% 
Super P carbon and 10 wt% polytetrafluoroethylene binder ina mortar 
and pestle and rolling to form a self-standing film. Electrodes were 
incorporated into CR2032 coin cells with electrolyte-soaked (NaPF, 
(Kishida) in propylene carbonate (99.7%, Sigma)) Whatman glass fibre 
separators and Na metal counter electrodes. Galvanostatic charge-dis- 
charge was carried out at a rate of 1O mA g‘ using a Maccor Series 4000. 


Operando electrochemical mass spectrometry 

Operando electrochemical mass spectrometry was carried out using a 
cell (ECC-Std from EL-CELL) with gas inlet and outlet ports. Argon car- 
rier gas was flowed at constant rate (0.8 l min“ Bronkhurst mass-flow 
controller) through the cell and into a quadrupole mass spectrom- 
eter (Thermo Fischer) equipped with turbomolecular pump (Pfeiffer 
Vacuum). 


Solid-state NMR 

Solid-state NMR experiments were performed on a 400-MHz Bruker 
Avance III HD spectrometer at the °Li Larmor frequency of 58.99 MHz. 
All spectra were recorded witha rotor-synchronized Hahn-echo pulse 
sequence. The °Li spectra were externally referenced with LiCl aqueous 
solution at 0.0 ppm. 

For Nao 6lLio2Mnos]0,,a3.2-mm MAS probe was used. The MAS rate 
was 19 kHz, and the probe temperature was controlled at 268 K. The 
applied 1/2 pulse length was 3.5 pts and the delay between tt/2 and 1 
pulses was 47.4 ps (one rotor period). The transmitter frequency was 
set to 1,600 ppm. For Nag75[Li..s;Mn,75]0,, a 1.9-mm MAS probe was 
used with MAS rate of 38 kHz, and the probe temperature was set to 
298 K. The applied 1/2 pulse length was 2 ps and the delay between 11/2 
and tt pulses was 23.3 pts (one rotor period). The transmitter frequency 
was set to 1,600 ppm. 

The spectra were normalized by the total number of scans and the 
weight of active materials packed in the rotors. Spectra fitting and 
deconvolution were carried out with the dmfit program”. 


ADF-STEM 

ADF-STEM micrographs were collected on an aberration-corrected 
JEOL ARM 200F operated at 200 kV. The convergence semi-angle 
used was 22 mrad, and the collection semi-angle was 69.6-164.8 mrad 
(ADF). In all cases, sets of fast-acquisition multi-frame images were 
recorded and subsequently corrected for drift and scan distortions 
using SmartAlign”. 


Computation 

DFT calculations including Hubbard corrections*® were performed 
using Quantum Espresso”. We used the Perdew, Burke and Ernzer- 
hof (PBE)*° exchange-correlation functional. The core-valence inter- 
action was taken into account by using the projector-augmented 
wave (PAW) method“. The wavefunctions were represented through 
a plane-wave basis set with an energy cut-off of 70 Ry. Spin polariza- 
tion was included. All calculations were performed considering a 
ferromagnetic ordering of Mn atoms. A Hubbard U parameter of 
4 eV for Mn 3d states was used, similar to that reported for other 
closely related compounds”. To find the k-point condition, the 
total energy of the supercell was converged with respect to the 
number of k points, and convergence was reached witha 2 x 2 x 2 
Monkhorst-Pack k-point grid. Crystal structures were relaxed 
until forces on the atoms were less than 0.08 eV A‘ and the total 
stresses on the cell were less than 0.05 kbar. The supercell for 
Nao 7sLLio.2.5;Mn,7;]0, contains 90 atoms: 18 Na atoms; 6 Li atoms; 
18 Mn atoms; and 48 O atoms. 

For electronic structure calculations, we carried out spin-polar- 
ized DFT calculations using the HSE functional. An exact exchange 
mixing parameter of 0.25 was used for all calculations. Norm-con- 
serving pseudopotentials were used to describe the core-valence 
interaction**. The electronic wavefunctions were described using a 
plane-wave basis set with an energy cut-off of 80 Ry. AMonkhorst- 
Pack k-point grid of 2 x 2 x 2 was used. The input structures were 
obtained from the DFT+U lattice relaxations, and the nuclear posi- 
tions were allowed to further relax at the HSE level, keeping the lattice 
parameters fixed. 

Intercalation—deintercalation voltages (V) were computed using 
the Nernst equation, V= AG/(zF), where AG is the Gibbs free energy 
change, Fis the Faraday constant and zis the charge that is transferred. 
The change in the Gibbs free energy is defined as AG=AE+PAV-TAS, 
where Pand Tare pressure and temperature, respectively, and AF, AV 
and AS are the change in internal energy, volume and entropy, respec- 
tively. The first-principles calculations were carried out at O K and zero 
pressure. Under these conditions, the Gibbs free energy change is then 
given by the change in the internal energy, AG= AE. Thus, for the sodium 
deintercalation-intercalation reaction Na,,TmO, > Na,,TmO, + (x,-x,) 
Na, the voltage is given by: 


E(Na,,TmO,) ~ E(Na,,TmO,) ~ (x, -%)E(Na) 
(%)-%4)F 


where x, >x, and £(Na,,TmO,) and E(Na,,TmO,) are the internal energies 
of the sodiated and desodiated transition metal (Tm) oxides, respec- 
tively, and F(Na) is the internal energy of metallic sodium. These quan- 
tities are obtained directly from the first-principles calculations. This 
procedure is well established***>. 

For phases with partial Na occupancy, Na ordering was investigated 
using combinatorics. Simple random sampling was used to choose a 
representative subset of non-symmetry equivalent configurations for 
relaxation. A similar methodology was used to investigate Mn disor- 
der in the charged honeycomb phase. Fully desodiated models were 
prepared for the charged phases to make calculations more compu- 
tationally tractable. 


Spectroscopic characterization 

Soft XAS and high-resolution RIXS data were recorded at i21 Dia- 
mond Light Source in the UK with supporting data from BL27SU 
of the RIKEN/JASRI Spring8 synchrotron in Japan and the ADRESS 
beamline at the Swiss Light Source. Mn L-edge data were collected 
in inverse partial fluorescence yield mode, and O K-edge data are 
plotted in the partial fluorescence yield mode, both of which are 
bulk-sensitive methods. 
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Extended Data Fig. 1| Further structural characterization of pristine 
materials. a,b, Neutron powder diffraction data for (a) Nag 7s[Lig »s;MNo 7510, 
and (b) Nag [Lio.2Mno.]0, refined using the P6,/mmc space group, which 
excludes superstructure ordering. Rietveld refinement was performed with 


GSAS II software. Refinement parameters are given in the tables in the lower 
part of the figure, including the goodness of fit (G.O.F). Inductively coupled 
plasma (ICP) optical emission spectroscopy was used to confirm the chemical 
compositions. 
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Extended Data Fig. 2 | Diffraction peaks arising from ribbon superstructure 
ordering in Nay,,[Lip,Mno,,10,.a, PXRD data for pristine Nag .[Liop,Mno s]O, 
indexed using the P6;/mmc space group which does not account for 
superstructure peaks arising from in-plane ordering in the TM layer. 

b, Superstructure region of PXRD compared with computer generated 


20 25 
20 (degrees) 
diffraction patterns. Model crystal structures were prepared with different 
alignments of ribbon ordered layers. The only structure to successfully match 
all the peaks is the P2,/c space group. Structures are all viewed along the [010] 
direction. 
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Extended Data Fig. 3 | Operando gas evolution analysis. OEMS collected on 
Nao 6LLio..2Mngs]O, at 10 mA g ‘ between2 Vand 4.5 V. No direct O, lossis 
observed. Only avery small quantity of CO, is released at 3.5-4.2V, 
characteristic of alkali carbonate decomposition; a small amount is released at 
4.5V due to direct electrolyte oxidation. Overall, 0.005 moles of CO, per mole 
of active material were detected during charge compared with 0.4 moles of 
charge stored per mole of active material. Evenif all of this CO, arose from O 
loss from the lattice, it would constitute only a minimal contribution 

(0.02 moles of charge stored per formula unit, f.u., or about 5%), to the charge 
capacity observed. 
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Extended Data Fig. 4 |Manganese L-edge spectra and low-resolution RIXS 
for Nao .[Lip.Mn,]0,.a, Electrochemical load curve for first cycle of 

Nao 6LLio..2Mno,s]O2 showing state of charge points selected for ex situ analysis. 
b, Manganese L-edge data collected in inverse partial fluorescence yield mode 
show that Nay,[Lig,Mn, ,.]O, remains unchanging at Mn** throughout the 
charge and discharge cycle. Standards shown below are MnO (+2), MnO; (+3) 
and Li,MnO,(+4).c, Low-resolution RIXS spectra collected at BL27SU, Spring8 
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synchrotron, Japan, at 531 eV excitation energy showanew feature at an 
emission energy of approximately 523 eV corresponding to new hole states 
formed on O and an increase in the elastic peak intensity (labelled with arrows). 
These new features disappear on discharge indicating O reduction, and the 
spectra are almost superimposable with those collected for the pristine 
material. The intensity of both features appears much less pronounced than 
the O redox features measured on honeycomb-ordered O-redox materials. 
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Extended Data Fig. 5 | Two-phase evolution in Lienvironment in charge-discharge plateau. Negligible change in the lithium environment is 
Nao,6[Lip..Mny.,]0,. Ex situ °Li MAS NMR spectra for Nag [Lip ,Mnys]O, observed below the end of the plateau on discharge in the single-phase region. 
collected at different states of charge illustrate the two-phase nature of the Arrows indicate the unique isotropic chemical shifts for Li. 
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P2/Lity O2/Lit\y O2/Lig, O2/Lig,/Mngis 
Nao oLip. 25Mng 7502 
b 
Model a (A) b (A) c (A) a (°) B (°) vy (°) 
Pristine (DFT) 5.01 5.01 11.04 89.75 90.14 120.16 
Pristine (exp.) 4.99 4.99 10.95 90.0 90.0 120.0 
Deviation (%) 0.4 0.4 0.8 0.3 0.2 0.1 
Charged (DFT) 5.0 5.0 10.38 90.07 89.99 120.1 
Charged (exp.) - - 10.0 - - - 
Deviation (%) - - 3.8 - - - 
Discharged (DFT) 5.12 6.15 11.08 89.56 89.82 120.73 
c Honeycomb Ordered Nao 7s[Lip 25Mno75]O2 d Ribbon Ordered Nao ¢[Lip.2Mno]O, 
(ii) 


AE = -2.43 eV 
> 


(iii) GLi,,,Mn, 7,0, + 0.75Na Nag sLio.2sMNo 7502 


voltage = 3.2 V 
Extended Data Fig. 7| Energetic stability afforded by O, formation and 
computed discharge voltage. a, Energetics of possible configurations of 
desodiated structural models for Nao oLio.2s;Mno 7502. The models considered 
are: P2-type stacking with Liin the TM layer (P2/Li;,,); O2-type stacking with Li 
inthe TM layer (O2/Li;,,); O2-type stacking with Liin the AM layer (O2/Li,,); and 
O2-type stacking with Liin the AM layer and with in-plane Mn disorder (O2/Li,,/ 
Mn,;,). Inthe last case, various Mn disorder configurations were investigated 
corresponding to the different crosses. The lowest-energy structure is 
pictured (ab-plane) and possesses clusters of vacancies and Mn-bound O, with 
anO-O bond length of 1.2 Acorresponding to molecular O,. For simplicity, the 


AE = -2.46 eV 


NaoeLio.2Mnos02 


iti NasLio.2Mino.gO2 + 0.6Na 
Voltage = 4.1 V 


energies of the optimized models are plotted relative to the energy of the 
model P2/Li;,,, the energy of which was set to zero. The yellow curve is a guide 
tothe eye to indicate the models with the lowest total energies. b, Calculated 
lattice parameters for pristine, charged, and discharged Nag 7sLig..;MNo7;0>. 
They are compared with experimental data. The deviation between theory and 
experiment is also reported. c, d, Structural models used to compute the 
change in energy and average voltage for Nao 7sLio.23;MMo,7502 (c) and 

Nao ¢Lip.2Mno sO, (d) respectively. Discharge reactions and calculated voltages 
are given in (iii). Purple, Mn; green, Li; yellow, Na. 


Article 


P3-type Nao ¢[Lip.2Mno.]O2 


ie) 
= 


N 

oO 
rs 
oO 
— 


Auch, 


me 


Ribbon Ordered 


K. Du et al., Energy Environ. Sci., 9, 2575 (2016) 


Na 


Li 


ros oer 


Without TM Ordering 


SS ISITE ICICI 
sono oOnooon 
DB; mae mS ure aS a: LS uae Ay Ag 


COCOOOOOOCO 


20 (degrees) 


Extended Data Fig. 8| Ribbon ordering identified in P3-type Nao .[Lip,.Mno 5] 
O,. PXRD data for P3-type Nao g[Lip. 2Mny,J]O, reproduced with permission from 
ref.*, Beloware calculated diffraction patterns for the P3 structure with and 
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positions of superstructure peaks. The structure used for the calculation of the 
ribbon-ordered P3-type Nao 6Lio2Mno,,O, diffraction pattern is shown tothe 
right and possesses an offset arrangement of ordered layers. 
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Extended Data Fig. 9 | Evolution of electrochemical behaviour over resting 
and cycling. a, Electrochemical load curves for Nao [Lio 2Mno 10, electrodes 
charged to4.5Vatarate of 10 mA g", then rested at open circuit voltage (OCV) 
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Extended Data Fig. 10 | Graduallossofribbonsuperstructureorderingfrom = remainsharpuponcycling (indexed as 002 and 004 according to the P6;/mmc 
diffraction. Ex situ PXRD patterns for Nag .[Lip 2Mno 10, in the discharged space group without superstructure), whereas all other peaks, especially the 
state (2.0 V) after 1, 10 and 20 charge-discharge cycles (between2.0Vand4.5V 010 and110, which are unique to ordering within the ab plane, broaden and 
at10 mAg’). Peaks arising from the periodicity uniquely along the c-axis reduceinintensity as the ribbon superstructure is lost oncycling. 


Article 


Molecular tuning of CO,-to-ethylene 


conversion 


https://doi.org/10.1038/s41586-019-1782-2 


Received: 21 December 2018 


Accepted: 1 October 2019 


Published online: 20 November 2019 


Fengwang Li'®, Arnaud Thevenon?*, Alonso Rosas-Hernandez2*, Ziyun Wang"”, Yilin Li'®, 
Christine M. Gabardo®, Adnan Ozden*, Cao Thang Dinh’, Jun Li'*, Yuhang Wang’, 
Jonathan P. Edwards’, Yi Xu*, Christopher McCallum*, Lizhi Tao’, Zhi-Qin Liang’, 
Mingchuan Luo’, Xue Wang’, Huihui Li’, Colin P. O’Brien®, Chih-Shan Tan’, Dae-Hyun Nam’, 
Rafael Quintero-Bermudez', Tao-Tao Zhuang’, Yuguang C. Li’, Zhiji Han?, R. David Britt’, 


David Sinton®, Theodor Agapie”*, Jonas C. Peters”* & Edward H. Sargent 


The electrocatalytic reduction of carbon dioxide, powered by renewable electricity, 
to produce valuable fuels and feedstocks provides a sustainable and carbon-neutral 
approach to the storage of energy produced by intermittent renewable sources’. 
However, the highly selective generation of economically desirable products suchas 
ethylene from the carbon dioxide reduction reaction (CO,RR) remains a challenge’. 
Tuning the stabilities of intermediates to favour a desired reaction pathway can 


improve selectivity 


5, and this has recently been explored for the reaction on copper 


by controlling morphology®, grain boundaries’, facets®, oxidation state’ and 
dopants”®. Unfortunately, the Faradaic efficiency for ethylene is still low in neutral 
media (60 per cent at a partial current density of 7 milliamperes per square centimetre 
in the best catalyst reported so far’), resulting in a low energy efficiency. Here we 
present a molecular tuning strategy—the functionalization of the surface of 
electrocatalysts with organic molecules—that stabilizes intermediates for more 
selective CO,RR to ethylene. Using electrochemical, operando/in situ spectroscopic 
and computational studies, we investigate the influence of a library of molecules, 
derived by electro-dimerization of arylpyridiniums", adsorbed on copper. We find 
that the adhered molecules improve the stabilization of an ‘atop-bound’ CO 
intermediate (that is, an intermediate bound to a single copper atom), thereby 
favouring further reduction to ethylene. As a result of this strategy, we report the 
CO,RR to ethylene with a Faradaic efficiency of 72 per cent at a partial current density 
of 230 milliamperes per square centimetre in a liquid-electrolyte flow cell in a neutral 
medium. We report stable ethylene electrosynthesis for 190 hours in a system based 
onamembrane-electrode assembly that provides a full-cell energy efficiency of 20 
per cent. We anticipate that this may be generalized to enable molecular strategies to 
complement heterogeneous catalysts by stabilizing intermediates through local 


molecular tuning. 


Recently we found that an N-aryl-substituted tetrahydro-4,4’-bipyridine 
organic thin film, formed by reductive electro-dimerization ofan N-aryl 
pyridinium additive (Fig. 1a; see Supplementary Information for details), 
facilitated selective CO,RR to multi-carbon products on Cu foils”. How- 
ever, the selectivity and partial current density for ethylene are low 
(about 40% and 0.5 mA cm”) for practical applications. We sought to 
clarify factors contributing to the selectivity enhancement to enable 
further design of new functional molecules with better performance. 

Noting that local environment plays a role in electrocatalysis through 
tuning interactions among reactants/intermediates” ", we postulated 


that the N-arylpyridinium-derived film may affect the selectivity of 
CO,RR by interacting with the reaction intermediate(s). To test this 
hypothesis, we first prepared a library of N-arylpyridinium salts (1-11, 
Fig. 1b, Supplementary Figs. 1 and 2) expected to display different elec- 
tronic properties. We then electrodeposited these N-arylpyridinium 
precursors onto a porous polytetrafluoroethylene gas diffusion layer” 
witha sputtered Cu layer serving as both current collector and catalyst. 
The as-electrodeposited thin film is water-insoluble and consists of a 
mixture of both constitutional isomers and stereo isomers of N-aryl- 
substituted tetrahydro-bipyridine species (Fig. 1a, Supplementary 
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Fig. 1| Dimerization of N-arylpyridinium additives, and correlation of 
ethylene selectivity with Bader charge. a, Reaction describing the electro- 
dimerization process that converts an N-arylpyridinium salt to a mixture of 
N-aryl-substituted tetrahydro-bipyridines. b, Molecular structures of additives 
1-11. OTf is trifluoromethanesulfonate. Cl and OTf are the counter-ions of 
the derivatives. c, Trend for ethylene FE and calculated Bader charge for the 
nitrogen atom of the N-aryl-substituted tetrahydro-bipyridines prepared from 
1-11. Owing tothe symmetric molecular structure of the tetrahydro- 
bipyridines, ahydrogen atom was used to replace half of the dimer unit (see 
Supplementary Fig. 6 for details). A spread of Bader charges for the nitrogen, 


Note 1, Supplementary Figs. 3-5). As expected, Bader charge analysis 
points to different electron donating abilities of these tetrahydro-bipyr- 
idines (Supplementary Fig. 6). Coating of the tetrahydro-bipyridine film 
onto the Cu electrode does not substantially change its morphology, 
crystallinity, electronics or wettability, nor does it retard the transport 
of reactants, ions and products, which is needed in electrocatalytic 
processes (Supplementary Note 2, Supplementary Figs. 7-10). 

We evaluated CO,RR properties of these tetrahydro-bipyridine- 
functionalized electrodes in a liquid-electrolyte flow cell system 
(Supplementary Fig. 11), using CO,-saturated 1 M aqueous KHCO, as 
the supporting electrolyte. In this system, the abundant catalyst/elec- 
trolyte/CO, triple-phase interfaces overcome the CO, mass-transport 
limit’”"’ and thus enable commercially relevant current densities””°. We 
note that, although the large achievable current densities in the flow cell 
drive up local pH (Supplementary Fig. 12), the tetrahydro-bipyridine 
layer does not create a further pH gradient near the active Cu surface 
(Supplementary Note 2). The layer is chemically robust to the locally 
alkaline environment (Supplementary Fig. 13). The Faradaic efficiency 
(FE) for ethylene (Supplementary Table 1) on additive-modified Cu-x 
electrodes (x = 1-11), at the optimal applied potentials, -0.82 V to 
—0.84 V versus the reversible hydrogen electrode (RHE; all potentials 
are with respect to this reference), was plotted against the Bader charge 
of the nitrogen atom of each tetrahydro-bipyridine structure (Fig. 1c). 
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covering the limiting values of the para,para and ortho,ortho structures, was 
plotted. The circles correspond to the average contribution from both the 
para,paraand ortho, ortho isomers where their ratio could be determined by'H 
NMR spectroscopy (see Supplementary Note 1 for details). The error bars for 
ethylene FE uncertainty represent one standard deviation based on three 
independent samples. The corresponding error bars for ethylene FE 
uncertainty were arbitrarily placed in the middle of the limiting values for those 
tetrahydro-bipyridines for which the para, para versus ortho,ortho ratio could 
not be reliably determined by 'H NMR spectroscopy. 


We founda volcano-shaped trend relating FE and Bader charge, with the 
tetrahydro-bipyridine of moderate electron-donating ability showing 
the highest ethylene selectivity. 

We further found a volcano-shaped relationship between the eth- 
ylene selectivity and the ratio of atop-bound CO (CO,,,,) to bridge- 
bound CO (that is, CO bound to two Cu atoms, hereafter COp,iage) On 
Cu-x surfaces (Fig. 2a). We identified and quantified these bound CO 
configurations through in situ Raman spectroscopic interrogation” * 
of these surfaces (Supplementary Note 3, Supplementary Figs. 14 and 
15, Supplementary Table 2). In all cases, the ratio Of CO,,) tO COpridge 
on Cu-x was increased relative to that on bare Cu. Noting a correla- 
tion between ethylene selectivity and electron-donation propensity 
(Fig. 1c), we hypothesized that the change of the relative population of 
COrtop ANd COpyidge Could arise from the difference in electron-donating 
abilities of the tetrahydro-bipyridines. Indeed, we found that the ratio 
Of COztop tO COpridge WAS POSitively correlated with the Bader charge of 
the nitrogen atom in the tetrahydro-bipyridines (Fig. 2b). This finding 
suggests that electron donation to the*CO stabilizes the atop CO more 
than it does the bridge CO. 

To gain molecular-level insight into the effect of CO binding, we 
calculated, using density functional theory (DFT), reaction barriers 
for the CO dimerization step, a critical step along the pathway to C, 
products? (that is, products with two carbon atoms, suchas ethylene 
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Fig. 2| Mechanistic investigations of the stabilization of CO-bound 
intermediates. a, The relationship between the ethylene FE and the ratio of 
atop CO and bridge CO on Cu-x electrodes. The relative population of these 
two kinds of Cu-bound CO was calculated through the integrated areas of each 
band inthe Raman spectra, which are proportional to the corresponding *CO 
coverage (see Supplementary Note 3 for more details). The error bars for 
ethylene FE uncertainty represent one standard deviation based on three 
independent samples. b, The relationship between the ratio of atop CO to 
bridge CO on Cu-x and the Bader charge for the nitrogen atom of the N-aryl- 
substituted tetrahydro-bipyridine formed from additive x. The Bader charges 


and ethanol), on Cu(111) with the initial configurations of two *CO onthe 
atop:atop, atop:bridge and bridge:bridge sites (Fig. 2c, Supplementary 
Fig. 16). We found the lowest barrier of CO dimerization to be at the 
atop:bridge site with a barrier of 0.72 eV. In comparison, the barrier for 
the bridge:bridge site is 0.82 eV. The barrier for the atop:atop site could 
not be identified: one of the CO onatop site tends to relocate to bridge 
site, suggesting that atop:atop is not favourable for CO dimerization. 
These findings indicate that neither too large nor too small a popula- 
tion of atop CO favours C, selectivity. 

We further calculated the adsorption of CO on Cu(111) (Supplemen- 
tary Fig. 17, Supplementary Table 3). On bare Cu(111), the bridge site 
appears to be the most stable adsorption site for CO. In the presence 
of the tetrahydro-bipyridine formed from 1, the adsorption of CO on 
both bridge and (especially) atop sites is enhanced, and the atop site 
becomes favoured compared with the bridge site. The enhancement 
of CO binding energy decreases the desorption of *CO and increases 
the likelihood of further reduction of *CO to ethylene (Supplementary 
Figs. 18-20). 

We visualized the interaction between the tetrahydro-bipyridine 
molecule and*CO through the electron density difference plot (Fig. 2d). 
The electron density appears to transfer from the molecule to nearby 
water molecules, changing the electronic distributions of water sur- 
rounding *CO, and enhancing CO adsorption inthe favourable atop site. 

In sum, our working model is that H,O-mediated electron density 
transfer of the tetrahydro-bipyridine film to *CO stabilizes this interme- 
diate, especially on the atop site, and therefore promotes the energy- 
favourable dimerization of bridge:atop bound CO, leading to enhanced 
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and associated uncertainty were calculated using the same protocol asin Fig. 1. 
Theerror bars for the ratio of CO,¢o, tO COpridge in a and b represent one standard 
deviation based on two independent measurements. c, Energy barriers of the 
dimerization of two CO at both bridge sites and two CO at bridge and atop sites, 
respectively. IS, initial state; TS, transient state; FS, final state. d, Plots of 
electron density difference for the CO adsorption with one water layer and the 
tetrahydro-bipyridine formed from 1. The yellow and blue contours represent 
electron density accumulations and depressions, respectively. Dashed lines 
indicate hydrogen bond network. Red, O; grey, C; blue, N; white, H; pink, Cu. 


ethylene selectivity. However, too strong an adsorption of CO caused 
by strong electron donation of some tetrahydro-bipyridines (right side 
of the volcano plot in Fig. 1c) results in overload of atop-bound CO and 
thus yields energy barriers too large for further reaction. 

We found, by using operando X-ray absorption spectroscopy (XAS, 
Supplementary Fig. 21), that tetrahydro-bipyridine does not modu- 
late the oxidation state or coordination environment of Cu—although 
such modulation is known to promote ethylene formation’. We also 
found, from in situ electrochemical electron paramagnetic resonance 
spectroscopic (EPR) and isotopic labelling studies (Supplementary 
Figs. 22-24), that tetrahydro-bipyridine does not mediate electron 
transfers viaits conversion to pyridinium radicals’*”°, nor does it medi- 
ate hydrogen-transfer steps. 

Because the nitrogen atom of the N-aryl-substituted pyridine ring 
influences the binding of *CO, we posited that an N-aryl-pyridinium- 
derived molecule with more nitrogen sites and optimal electron-donat- 
ing properties would stabilize more *CO onthe Cu surface. Accordingly, 
we synthesized an N,N’-(1,4-phenylene)bispyridinium salt (12, Fig. 3a, 
Supplementary Fig. 1). In contrast with 1-11, 12 underwent oligomeri- 
zationto forman/A-aryl-dihydropyridine-based oligomer under elec- 
trodeposition (Fig. 3a, Supplementary Fig. 5). The Bader charge of the 
nitrogen atom of the oligomer (Supplementary Fig. 6) is close to that of 
the tetrahydro-bipyridine from 1, and, as expected, the ratio of CO,,o5 
to COpridge ON Cu-12 (Supplementary Fig. 15, Supplementary Table 2) is 
also close to that on Cu-1. Based on the working hypotheses presented 
here, these findings suggest the Cu-12 catalyst should approach the 
top of the volcano plot. 
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Fig.3|CO,RR performance in liquid-electrolyte flow cells. a, Reaction 
describing the electro-oligomerization of the N,N’-(1,4-phenylene) 
bispyridinium salt 12 to form an /AN-aryl-dihydropyridine-based oligomer. b, FE 
of ethylene on Cu and Cu-12 using CO,-saturated 1M KHCO, as the supporting 


We evaluated the CO,RR performance of Cu-12 inthe same flow cell 
system. The ethylene FE on Cu-12 is higher than that on bare Cu and 
other Cu-x across the entire applied potential range (—0.49 Vto -0.84 V) 
and achieves a peak value of 72% at —0.83 V (Fig. 3b, Supplementary 
Tables 1 and 4), higher than previous selectivities reported for ethylene 
in neutral media (Supplementary Table 5). In contrast, the ethylene FE 
on bare Cu under similar conditions is below 40%. High selectivity and 
high current density combine for an ethylene production current of 
232 mA cm” at -0.83 V (Supplementary Fig. 25). 

We examined the FEs of CO and ethylene across the applied poten- 
tial range. Although the FE of CO follows the same trend of peaking at 
moderate potentials, more CO is converted to ethylene on Cu-12 than 
on pure Cu (Fig. 3c, Supplementary Table 4). Specifically, at the applied 
potential of -0.83 V, the FEs of CO and ethylene on Cu-12 electrode are 
5% and 72%, respectively, whereas the values on bare Cu are 35% and 
37%, respectively (Supplementary Fig. 25). The FEs of other CO,RR 
products remain similar on both catalysts. These findings suggest 
that the increased ethylene selectivity arises primarily at the expense 
of CO evolution. This behaviour agrees with the in situ Raman spec- 
troscopy and DFT calculations, where the *CO is well stabilized for 
ongoing reduction on the molecularly functionalized Cu electrode. 


electrolyte. c, FEs of CO and ethylene on Cuand Cu-12at the applied potential 
range of -0.47 Vto -0.84 V. The error bars for FE uncertainty represent one 
standard deviation based on three independent samples. 


We confirmed by isotopic CO, studies (Supplementary Fig. 26) that 
the products were from CORR. 

To evaluate the potential of the Cu-12 catalyst for practical applica- 
tions, we integrated it into a membrane-electrode-assembly device 
(Supplementary Note 4, Supplementary Figs. 27-34) for electrosyn- 
thesis of ethylene through the overall reaction: 


2CO, +2H,0>C,H,+30,; F°=L15 V 


where Fis the equilibrium potential for the reaction. 

We operated the membrane-electrode-assembly system at a full-cell 
voltage of 3.65 V for 190 h. It exhibited a stable current (approximately 
600 mA) and astable ethylene selectivity (64%) in neutral medium 
(Fig. 4). The energy efficiency (EE) of the system is determined to be 
20% via: 


EE futt-cell - (E° * FE etnytene)/Etutt-cell 
Overall, this work presents a strategy to tune the stabilization of 


intermediates on heterogeneous electrocatalysts through the intro- 
duction of organic molecules. Using this strategy, implemented with 
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Fig. 4| Ethylene electrosynthesis ina membrane-electrode assembly device. 
The operating current and ethylene FE were monitored for the device. Cu-12 
and iridium oxide supported on titanium mesh were used as the cathode and 
anode, respectively. Humidified CO, was flowed through the gas channels in 
the cathode, and 0.1M aqueous KHCO; solution was flowed through channels 
inthe anode. The anode and cathode were separated by an anion exchange 
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membrane to form the membrane-electrode assembly. The total geometric 
area of the flow field in the cathode is 5 cm’, of which 45% is the gas channel 
while the remaining 55% is the land area (Supplementary Figs. 27 and 28). Full- 
cell voltage was gradually increased from 3 V to 3.65 Vand kept constant 
starting at time 0. 


N-aryl-substituted tetrahydro-bipyridine films and a related oligomeric 
film ona Cucatalyst, we achieved CO,-to-ethylene conversion with an 
ethylene FE of 72% and a full-cell energy efficiency of 20% in neutral 
media. In light of this performance, in combination with the long-term 
operating stability, this is a promising strategy for the use of renewable 
electricity to convert CO, into value-added chemicals, thus storing the 
renewable energy (solar, wind) in the form of chemical energy. 
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River deltas rank among the most economically and ecologically valuable 
environments on Earth. Even in the absence of sea-level rise, deltas are increasingly 
vulnerable to coastal hazards as declining sediment supply and climate change alter 
their sediment budget, affecting delta morphology and possibly leading to erosion’ °. 
However, the relationship between deltaic sediment budgets, oceanographic forces 
of waves and tides, and delta morphology has remained poorly quantified. Here we 
show how the morphology of about 11,000 coastal deltas worldwide, ranging from 
small bayhead deltas to mega-deltas, has been affected by river damming and 
deforestation. We introduce a model that shows that present-day delta morphology 
varies across a continuum between wave (about 80 per cent), tide (around 10 per 
cent) and river (about 10 per cent) dominance, but that most large deltas are tide- and 
river-dominated. Over the past 30 years, despite sea-level rise, deltas globally have 
experienced a net land gain of 54 +12 square kilometres per year (2 standard 
deviations), with the largest 1 per cent of deltas being responsible for 30 per cent of all 
net land area gains. Humans are a considerable driver of these net land gains—25 per 
cent of delta growth can be attributed to deforestation-induced increases in fluvial 
sediment supply. Yet for nearly 1,000 deltas, river damming* has resulted in a severe 
(more than 50 per cent) reduction in anthropogenic sediment flux, forcing a collective 
loss of 12 + 3.5 square kilometres per year (2 standard deviations) of deltaic land. Not 


all deltas lose land in response to river damming: deltas transitioning towards tide 
dominance are currently gaining land, probably through channel infilling. With 
expected accelerated sea-level rise’, however, recent land gains are unlikely to be 
sustained throughout the twenty-first century. Understanding the redistribution of 
sediments by waves and tides will be critical for successfully predicting human-driven 
change to deltas, both locally and globally. 


River damming and land-use change affect the sediment supply to 
deltas, and can lead to substantial physical transformations of the 
coastal landscape. Existing attempts to predict delta morphology 
are conceptually rich but often qualitative® “. Most prominently, Gal- 
loway’ introduced a process-based ternary diagram, hypothesizing 
that delta morphology reflects the relative importance of wave, tide 
and river forcing. However, the lack of a quantitative prediction of 
delta morphology for a given relative influence of each forcing has 
prevented direct application of this foundational ternary diagram to 
understanding delta form. For example, how does decreased sediment 
supply affect deltas and how can this translate into land gain or land 
loss? A fundamental limitation in predicting delta change has been the 
poor understanding of howsediment supply has shaped modern delta 


morphology itself, motivating our development of ana priori theory 
of the controls of delta morphology. 


Anew model for delta change 


On the basis of two recent quantitative studies”, we here introduce 
aternary diagram that allows prognosis of delta morphology and mor- 
phologic change using sediment fluxes (Fig. 1a). We apply this approach 
ona global scale. First, we predict delta morphology for conditions 
that resemble a world without substantial human impact on the fluvial 
sediment supply. Next, we compare these predictions to the delta mor- 
phology that is expected on the basis of recent modifications to 
sediment fluxes due to both deforestation and river damming. 
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A Wave-dominated 
A Tide-dominated 
A River-dominated 


Fig. 1| Global distribution of predicted pristine delta morphologies. 

a, Galloway’ ternary diagram, recast to show the relative sediment fluxes Qyave, 
Qiide ANA Q, iver (See Methods). Insets are satellite images of representative delta 
morphologies, with arrows highlighting the predicted direction and 
magnitude of sediment fluxes. Map imagery in Figs. 1,3 and Extended Data 
Fig. 5 from NASA, Google Earth, TerraMetrics, 2019. b, Prediction of pristine 
(Q?. er) morphology of 10,848 deltas sized and coloured by fluvial sediment 


We distinguish between two formative values of the fluvial sediment 
supply (Qriver, in kilograms per second), representing pristine sediment 
fluxes before substantial anthropogenic influences (Q5, ,.) and con- 
temporary (‘disturbed’) sediment fluxes accounting for dam construc- 
tion and land-use change in the contributing drainage basins (Oy 
Because deltas respond to sediment flux changes over timespans of 
decades to centuries”, our delta morphology predictions based on 
Q4.,correspond to a future equilibrium state towards which deltas 
are currently evolving. Using observations of delta land area changes 
in 1985-2015, we can investigate how much humans have changed 
deltas and how deltas may change in the future. 

Our ternary diagram compares the fluvial sediment supply to tide- 
and wave-driven sediment fluxes near the river mouth. First, in the 
absence of tides, a delta is expected to attain a wave-dominated, trian- 
gular shape if the potential for waves to move sediment away from the 
river mouth (Q,,ve, in kilograms per second; see Methods) exceeds the 
delivered fluvial sediment flux (Q,;,.,). Importantly, Q,;,.,and Qy,,.enable 
predictions independent of the observed delta morphology and allow 
these sediment fluxes to be used for delta change forecasting. The ratio 
Qyiver/ Qwave (termed the fluvial dominance ratio, R) indicates whether 
a delta does not deflect the coastline (R = 0; for example, Eel; Fig. 1a), 


Number of deltas 


Discharge Sediment flux 


flux. Axes follow a sigmoidal, rather than linear, function to better illustrate 
the distribution of strongly wave-, river- or tide-dominated deltas. c, Global 
geographic distribution of predicted pristine delta morphologies (see .kml file 
at https://doi.org/10.17605/OSF.10/S28QB). Plots in Figs. 1-3 and Extended 
Data Figs. 1-5 generated by Matlab 2018b (https://mathworks.com/products/ 
matlab.html). 


has a roughly triangular shape with a shoreline angle between 0° and 
45° (0 <R<1; for example, Grijalva), or is river-dominated (R > 1; for 
example, Mississippi). Increases in R lead to increased deposition near 
the river mouth, whereas decreases in R can result in distal shoreline 
progradation even as the river mouth erodes”. 

In the absence of waves, delta morphology is determined by the 
competition between river discharge and tidal flows. Morphologi- 
cally, tidal dominance manifests itself as a seaward widening of the 
channel banks“. By contrast, river-dominated delta channels have 
an approximately constant width. The tidal dominance ratio T, as 
originally defined”, relates the tidal discharge amplitude to the mean 
fluvial discharge. Here we use Tas a ratio of sediment fluxes and define 
atidal sediment flux (Q,4., in kilograms per second) along with a flu- 
vial sediment flux (Q,,,.,, in kilograms per second) (Fig. 1a, Methods). 
If 7<1, the delta is river-dominated and there is no flow reversal in the 
deltaic channel(s). If 7>1, the delta is tide-dominated and the widened 
deltaic channel(s), or some portion thereof, experience(s) flow rever- 
sal. Changes in 7 will affect delta channels; for example, a decrease in 
fluvial sediment flux (Q,ive,) Will cause the channel to infill and narrow”. 

Our ternary diagram represents the relative contribution of Q,i4, 
Qriver ANd Qyaye, and therefore also two morphological attributes of a 
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Fig. 2| Predicted delta morphologic change from pristine to future 
equilibrium conditions. a, Arrows indicate the direction and magnitude of the 
predicted change. Colour and thickness indicate the pristine fluvial sediment 
flux. b, Predicted anthropogenically driven morphologic change for aselection 
of well-known deltas. See also Extended Data Table 4. 


delta: the seaward divergence of the channel banks and the shoreline 
protrusion angle (Fig. 1a). It allows us to explore delta morphologies 
that arise from varying Qyide, Qriver ANd Qyave, including the expected 
morphology of deltas near the limit of low fluvial sediment flux, now 
or in the future”. Deltas near this limit are often referred to as strand- 
plains (for example, S40 Francisco’) or alluvial estuaries (for example, 
Elbe’). Here we show that this wide variety of coastal morphologies 
with different sizes lies along a continuum that can be characterized 
by the relative balance of these three sediment fluxes. For simplicity, 
we therefore refer to all morphologies within our ternary diagram as 
deltas—a broader definition compared to other studies”. 


Aglobal assessment of delta change 


To predict pristine delta morphology globally, we determined the loca- 
tion of coastal deltas worldwide (n=10,848 + 494; 2s.d.) and calculated 
pristine river-, wave- and tide-driven sediment fluxes. These fluxes 
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Fig.3|Rates and drivers of deltaland area change over the period 1985-2015. 
a,b, Land area change rates related to changes in the fluvial sediment supply (a) 
and pristine delta morphology (b).c,d, Land change in the Nile Delta, Egypt (c) 
and the Ord River Delta, Australia (d). Map imagery, NASA, Google Earth, 
TerraMetrics, 2019 and ref.*°. The inset diagrams indicate the predicted 
morphologic change. 


occur inall combinations, and the predicted delta morphologies vary 
across a continuum between wave, tide and river dominance, as tested 
against observed morphologies (see Methods). Most deltas are wave- 
dominated (~79% + 9%; 2 s.d.); however, large deltas (Q ver >50kgs7, 
n=701) are predominantly (68%) river- or tide-dominated (Fig. 1b), 
owing to their large fluvial sediment flux and their low-gradient delta 
plains (5 x 10* versus 3 x 10° for all deltas on average), making them 
conducive to large tidal sediment fluxes”. River- and tide-dominated 
deltas are associated with 83% of the modern fluvial discharge and 87% 
of the modern sediment flux to the global ocean. 

A comparison of equilibrium predictions for pristine and 
disturbed sediment fluxes shows the extent to which humansare likely 
to be modifying delta morphology by influencing river discharge and 
sediment fluxes. In total, 970 deltas have had their fluvial sediment 
supply reduced by >50%, collectively from ~9 x 10* kgs ‘to -2x10*kgs7, 
resulting in a shift towards wave or tide dominance (Fig. 2a). On the 
other hand, human-driven soil erosion, mostly through deforesta- 
tion, is predicted to have caused a >50% increase in sediment flux, or 
~5x10*kgs", to -1,500 deltas. We predict that sediment supply changes 
are forcing considerable ongoing adjustments in the shoreline protru- 
sion and channel width of many well-known deltas (Fig. 2b). 

Next, we use the Aqua Monitor” to investigate how our predicted 
ongoing morphologic change is reflected in recent delta surface area 
change (see Methods). We find that over the past 30 years, deltas 
globally have gained 181+ 8.3 km’ yr ‘and lost 127 +8.3 km’ yr“, resulting 
inanet gain of 54+11.8 km’ yr‘ (2s.d.). With a combined -9 x 10° m? yr 
fluvial sediment flux to the global ocean”, deltas on average require 
150 m? of sediment delivered to the coast for every square metre of 
land gain. Delta growthis particularly pronounced for tide-dominated 
deltas, representing 46% of the net land gain. 

We find that humans have measurably altered delta growth rates 
globally (Fig. 3a, Table 1). Human-induced changes to the fluvial 
sediment flux (Q4.. - Q* ver) explain 16% of the recent delta land area 


river 


Table 1| Global delta morphology and morphodynamic change 


Number ofdeltas TotalQ®,.,(kgs“") — Total@4,,,(kgs") Landgain(km’yr") Landloss(km?yr") Net land gain (km? yr“) 

Wave-dominated 8,552 6.0 x 104 5.9 x 10* 35+7 -17+7 19+10 
River-dominated 69 20 x 104 15 x 10° 49+3 -39+3 10+4 
Tide-dominated N27 22 x10* 22 «104 97+3 -72+3 2544 
Fluvial flux decrease (>50%) 970 9.2 x 104 1.8 x 104 15+3 -27+3 -12+4 
Fluvial flux increase (>50%) A478 31x10 7.6 x 10* 36+3 -14+3 2544 
Tidal reworking* 234 4.2x10* 1.0 x 10* 2+1 -1+1 0.9+1 
Wave reworking? 736 5.0 x 104 0.8 x 104 12+2 -25+2 -13+3 
Largest 1% 08 35 x 104 29 x10* 103+ -88+1 15+1 
Largest 10% ,085 46 x 10* 40 x 104 143 +3 -109+3 34+4 
Largest 100% (all deltas) 0,848 49 x104 43 x 104 181+8 -127+8 54412 


Error limits indicate 2 s.d. 
*Tidal reworking defined as a fluvial sediment flux decrease greater than 50% and Qwyaye < Quide 
’Wave reworking defined as a fluvial sediment flux decrease greater than 50% and Qwave > Qrides 


changes (P= 0). Deforestation has led to land gain, thus far exceeding 
land loss due to river dams. Delta change is most pronounced in South, 
Southeast and East Asia, where 57% of all new deltaic land is gained and 
61% of all delta land loss occurs. North America, owing to the rapid 
decline of the Mississippi Delta, partly due to damming”, is the only 
continent with a net decrease in deltaic area (Extended Data Table 3). 

Delta response to river damming depends on how waves and tides 
redistribute (rework) deltaic sediment (Fig. 3b). Two dominant patterns 
emerge. Deltas that are predicted to become more wave-dominated 
are, on average, eroding (Table 1). Morphologically, this change is 
expected because wave reworking of the delta near the river mouth 
results in erosion” (Fig. 3c). However, tidally influenced deltas that 
face markedly reduced fluvial sediment supply are slightly gaining (or 
not necessarily losing) land area (Table 1, Fig. 3d). This counterintui- 
tive result is caused by the infilling of deltaic channels”. In contrast to 
some studies (for example, inthe Amazon” or Yangtze’’) that assume 
that dams will lead to delta erosion, our analysis suggests that tides 
can overcompensate for the reduced fluvial discharge or sediment 
input and increase landward sediment transport. Increased landward 
transport probably results from the relative enhancement of tidal flood 
flow in cases where fluvial discharge (peaks) are decreasing”*”’ and 
comes at the expense of the extensive subaqueous delta. 


Discussion 


Because our predictions of delta morphologic change are global in 
scale, they exclude various processes affecting deltas now and in the 
future, suchas relative sea-level change and direct anthropogenic modi- 
fication—processes included in measurements of land area change. For 
heavily modified delta plains (for example, the Rhine-Meuse Delta), 
morphologic predictions based on changes in the fluvial sediment 
flux can indicate long-term system tendencies; however, the actual 
response will most probably involve direct human-delta interactions 
not considered by our approach. 

Our ternary diagram simplifies delta morphology into two shape 
metrics: delta protrusion angle and channel width. It therefore differs 
from earlier, qualitative work. For example, the Sao Francisco river is 
often thought of as having an end-member wave-dominated delta’. 
Here we show that the delta is wave-dominated, but that fluvial sedi- 
ment has created a substantial shoreline protrusion (R = 0.3) and that 
tides probably create flow reversal at the river mouth (7 =~ 1). We note 
also that two deltas that are placed near each other in our framework 
(for example, Volga and Huanghe; Fig. 2b) might be considered to be 
different onthe basis of other aspects of delta morphology (for exam- 
ple, shoreline rugosity, number of distributary channels). Our ternary 
diagram can help explore the origin of such morphologic differences. 


For example, Q,;,., is split across distributary channels, whereas Qua, 
and Q,,4.act on each river mouth. Via channel bifurcation, deltas that 
are marginally river-dominated can therefore transition towards wave 
or tide dominance”. Conversely, because Q,,,,. Suppresses channel 
bifurcation”®, we could potentially predict the number of distributary 
channels for river deltas. 

Changes to sediment fluxes explain dominant trends in delta plan- 
form evolution and are sufficiently general to allow for coupling with 
other processes. Sea-level rise and subsidence, for example, tend to 
increase deltaic channel and topset aggradation”, which would reduce 
fluvial sediment supply to the river mouth (Q,,,.,) and result in a relative 
increase of wave and tide dominance. Other controls on delta morphol- 
ogy, suchas grain size or wave climate changes”, can be incorporated 
into our model, but appropriate data for global applications are cur- 
rently lacking. For example, grain size is inversely correlated to Q,ie 
and Qyave (refs. ?"?), making coarser-grained deltas more likely to be 
river-dominated. 

Inconclusion, we can successfully predict large-scale delta morphol- 
ogy and we find that human intervention in drainage basins has hada 
considerable global effect. The recent reductions in sediment supply 
explain important patterns of land loss in cases where waves take over. 
Yet ona global scale, land gains resulting from deforestation exceed 
losses due to river damming. Inthe future, however, dam emplacement 
and sand mining is projected to accelerate in developing nations, fur- 
ther lowering fluvial sediment supply to river deltas”. Sea-level rise 
and land subsidence rates are increasing in many deltas*?**. Future 
predictions of delta morphology therefore will need to consider fur- 
ther diminished sediment loads and higher relative sea-level rise rates. 
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Methods 


We predict delta morphology and delta morphologic change by cal- 
culating potential sediment transport fluxes due to waves, tides and 
the river. We obtain delta land area change by summing land gain and 
land loss from recent global surface-water change studies”°*>. Our 
method involves the following seven steps, including estimates of 
uncertainty: (1) locating coastal river deltas globally, (2) obtaining the 
pristine and disturbed fluvial sediment flux for each delta, (3) calcu- 
lating the wave-driven and (4) the tide-driven sediment flux for each 
delta, (5) producing a morphological prediction for each delta, (6) 
testing the morphological prediction and (7) obtaining rates of delta 
land area change. 


Locating river deltas 

We locate coastal deltas using HydroSheds at a resolution of 15 arc- 
sec for all coasts south of 60° latitude*®. HydroSheds uses hydrologi- 
cally conditioned Shuttle Radar Topography Mission (SRTM)” data 
to generate gridded hydrologic data such as drainage direction 
and flow accumulation, and includes locations of river mouths 
globally. 

The 15-arcsec HydroSheds dataset contains about 2.48 million 
first-order drainage basins; 85% of those are smaller than 1 km? 
(ref. *8). Most of these small drainage basins have no river’®, and there- 
fore also no delta. They appear mostly along coastlines because of 
elevation noise that leads to poor drainage delineation of flat, low-lying 
areas” (Extended Data Fig. 1). For studies that focus on rivers, a com- 
monsolution to this problem is to limit the analysis to drainage basins 
larger than a certain size (for example, 40,000 km)". Unfortunately, 
this solution is not appropriate for our purposes because it would 
exclude many of the smaller deltas. Instead, we select river mouths 
with a drainage area of at least 50 km’ if it contains a drainage divide 
higher than 40 m above mean sea level. We also include drainage basins 
larger than 1,000 km’ regardless of the drainage basin topography. 
Accounting for drainage area elevations in small basins allows us to 
exclude most of the coastal noise caused, for example, by vegetation, 
but still captures many small, mountainous drainage basins. We find 
drainage divide elevations for all river mouths from our initial selection 
by extracting the SRTM elevation along each drainage basin boundary 
(Extended Data Fig. 1). 

For latitudes greater than 60°, where HydroSheds is not available, 
we find deltas by selecting drainage basins larger than 1,000 km’ based 
onthe 1-min ETOPO1* grid, which is available globally. We eliminate 
non-coastal deltas by only selecting potential delta-mouth locations 
closer than 12 arcmin to the National Oceanic and Atmospheric Admin- 
istration (NOAA) shoreline (-15 km, depending on the latitude)”. 

To further improve our dataset and include only alluvial river 
mouths, we use the WBMSED 2.0 distributed global-scale sediment 
flux model’*” and retrieve river discharge and sediment flux for each 
river mouth (see Methods section ‘Fluvial sediment flux Q,,,.,’). We 
remove river mouths witha river discharge below1m’s ‘or asediment 
flux below 0.01 kg s™ (arid environments). We use the global coastal 
typology dataset of Diirr et al.** to further remove drainage basins 
smaller than 1,000 km’ that drain into fjords, where Rand Tare unlikely 
to be appropriate indicators of their morphology. Our resulting dataset 
consists of 10,848 deltas on all major landmasses except Antarctica 
and Greenland. 

We investigate whether our criteria lead to the inclusion of most 
coastal deltas globally by creating a test dataset of deltas on Mada- 
gascar. Madagascar has a wide range of wave exposure, tidal ampli- 
tudes and, consequently, coastal environments. Using 1-m-resolution 
DigitalGlobe images we visually identify 306 river mouths, of which 
236 appear deltaic (where the coastal morphology is affected by the 
presence of ariver; see .kml file at https://doi.org/10.17605/OSF.1O/ 
S28QB). Of the 236 deltas, our algorithm finds 212, and 24 deltas were 


not located (false negatives, generally small deltas). Our dataset also 
includes 12 drainage basins that do not have a delta (false positives); 
these tend to be tributaries to other rivers with confluences near the 
coast, or small drainage basins without an observable river. We include 
bayhead deltas in our dataset. 

Our test dataset allows us to compute the uncertainty on the global 
number of deltas (Extended Data Table 1). Combined, our assessment 
indicates an accuracy of 85%. By extrapolating globally outside Mada- 
gascar and following Olofsson et al.**, we obtain a standard deviation 
of 252 and 95% confidence bounds of +494. Because our false-negative 
and false-positive rates are comparable, our estimate of 10,848 coastal 
deltas is unlikely to be strongly biased“. 


Fluvial sediment flux Q,,,., 

To estimate the fluvial sediment flux for every delta, we use the WBMSed 
2.0 distributed global-scale sediment flux model’**”. WBMSed is an 
empirical model that calculates gridded daily fluvial water discharge on 
the basis of precipitation, temperature, soil type, elevation and other 
datasets, in this case for the years 1980-2010. Sediment discharge is 
then estimated using the BQART model*. 

WBMsSed is available globally at a resolution of 6 arcmin, which is 
lower than that of the HydroSheds data. We therefore convert the 
WBMSed accumulated discharge and sediment flux file to a discharge 
and sediment yield (Extended Data Fig. 2). We then sum the discharge 
and sediment yield across the drainage basins to calculate a discharge 
and Q,;,., for each delta. 

WBMSed accounts for human influences on fluvial sediment fluxes 
by including empirically tested trapping coefficients for river dams 
and human erosion parameters to account for land-use changes. By 
disabling these coefficients, WBMSed can estimate fluvial sediment 
fluxes for a world without humans”. We use ‘pristine’ (without humans) 
and ‘disturbed’ (with humans) model results from Cohen et al.” to 
investigate human-induced changes to delta morphology (Extended 
Data Fig. 3). We note that depending on the history of anthropogenic 
change, pristine conditions can refer to different time periods, depend- 
ing onthe drainage basin. The Mekong River Delta, for example, has had 
along history of human impact onits fluvial sediment flux*®. Disturbed 
conditions refer to the present day and include the effects of afforesta- 
tion and improved soil conservation practices on the fluvial sediment 
flux to river deltas. WBMSed is validated by independent measure- 
ments of the fluvial sediment flux of pristine and disturbed drainage 
basins”. We note that both realizations are based on the 1980-2010 
hydroclimate, so we exclude the effects of longer-term climate change 
onthe fluvial sediment flux. 

WBMsSed provides a reasonable prediction of sediment discharge 
as tested against observations (R? = 0.66)". Sediment flux estimates 
remain challenging; therefore, predictions might differ from local 
case studies, both for pristine and for disturbed river basin conditions. 
WBMSed data should be considered estimates. 


Wave sediment flux Q,,,. 
To assess ocean wave effects on delta morphology, we calculate the 
maximum potential alongshore sediment flux Q,,,,. (ref. ”) for every 
delta using the NOAA WaveWatch III 30-year hindcast phase II*” by 
extracting the angular distribution of the wave energy, the significant 
wave height and the wave period (Extended Data Fig. 4). The resolution 
of the wave data varies between 4 arcmin and 30 arcmin depending on 
location and bathymetric complexity. We extract the closest available 
wave data for each delta. 

We calculate Q,,,,. by convolving the angular distribution of wave 
energy with an approximation of alongshore sediment transport 
recasted into deep-water wave properties 


Ovave= max | E(9,)2,(9~6)|- min [E(9)2,(9-2)] 
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where E (dimensionless) is the relative contribution of each wave 
approach angle @, to alongshore sediment transport. Q, (in kilograms 
per second) represents wave-driven alongshore sediment transport 
posed in deep-water terms as a function of the approach angle of the 
wave, (po, compared to the shoreline 0 (refs. ”48). We do not have global 
data of shoreline orientation, and therefore calculate Q,,,.by assuming 
maximum potential transport to the left and the right, away from the 
river mouth”. Given that most of the wave energy is directed towards 
the coast (not away from the coast), this is unlikely to be a major com- 
ponent of the uncertainty. 

Our analysis assumes that waves refract and shoal over shore-parallel 
contours”“*’ and that the delta is exposed to waves fromall directions. 
Complex nearshore bathymetry and shadowing by headlands can havea 
considerable effect on wave transformations, but cannot be accounted 
for inthis global model. We therefore assume that if wave data are found 
within 1° of the river mouth, the deltais not sheltered from wave attack. 
We assume negligible wave-driven sediment transport if the delta is 
located farther than 1° from available wave data (sheltered, mostly 
bayhead deltas). This cutoff could falsely identify some bayhead deltas 
as wave-dominated, whereas other open-coast deltas might be labelled 
river-dominated owing to the coarse WaveWatch Ill grid resolution. we 
note that this is animportant simplification that should be improved 
upon in the future. 

The fluvial dominance ratio R compares the wave-driven flux Q,a,. 
to the fluvial sediment that is retained nearshore. WBMSed predicts 
fluvial suspended load sediment fluxes, of which a large fraction will 
probably be lost to the marine environment. Bedload fluxes are more 
likely to be retained nearshore, but no global data exist to predict these 
fluxes. Here we assume that WBMSed approximates the fluvial sediment 
load that is retained nearshore. This assumption will most probably 
lead to an underestimation of wave dominance for larger, suspended- 
load dominated rivers and an overestimation of wave dominance for 
smaller, bedload dominated rivers. 

The fluvial dominance ratio R is dependent onthe number of distribu- 
tary channels. The potential alongshore transport Q,,,.acts on each 
river mouth, whereas Q,,,., is split between river mouths”. Because no 
global data on distributary channel networks exist we neglect the effect 
of distributary formation on Q,,,,., and therefore might underpredict 
wave influence on deltas with multiple distributaries (for example, 
Mekong Delta*’). 


Tidal sediment flux Q,i4. 

We calculate Q,,4. for every coastal delta to establish the effect of tides 
on delta morphology. Q,,4. is a tidal sediment flux amplitude at the 
mouth of a delta. If Q,,4. is large compared to Q,,;,.,, we predict consid- 
erable channel widening compared to the upstream (fluvial) channel 
width. Q.4. requires estimates of the tidal amplitude, angular frequency, 
channel cross-sectional aspect ratio and channel slope”. We extract the 
tidal amplitude and angular frequency of 13 tidal constituents glob- 
ally for all deltas using the 15-arcsec-resolution OSU TOPEX dataset”? 
(Extended Data Fig. 5). We define the mean tidal amplitude as half of the 
sum of all tidal constituents and use either a semi-diurnal or a diurnal 
frequency, depending on the delta location. 

We estimate the channel slope from the HydroSheds accumulated 
drainage area data (ACC files)*° and the global SRTM data” by tracking 
the elevation upstream from every delta up to 20 m above the mean 
sea level (Extended Data Fig. 1b). We then fit an exponential function 
to the elevation data and calculate the gradient of that function at sea 
level”. We assume a slope of 1x 10° (median slope of all coastal deltas) 
if SRTM elevation data are missing (>60° latitude) or ifits resolutionis 
insufficient to capture the water-surface elevation of deltas. 

Nienhuis et al. defined tidal dominance as the ratio of tidal discharge 
amplitude (Q,, jae, in cubic metres per second) and the mean annual 
river discharge (Qy iver, in cubic metres per second). To compare tidal 
dominance to wave dominance, here we define an equivalent tidal 


sediment flux Q.i4. by assuming that the sediment concentration of 
the tidal discharge is equal to the sediment concentration of the river 
discharge. We estimate Qiige aS 


O vee 
Qhide 7 Qw tide Q (2) 


w, river 


such that the ratio 7in discharge terms is equivalent to the ratio posed 
in sediment fluxes. We calculate Q,, jiae by 


2 
Qw tide = poner| B (3) 


where wis the tidal angular velocity (s“); kis a proportionality coeffi- 
cient (m‘) that is dependent on the grain size, Shields stress and flow 
roughness”; ais the mean tidal amplitude (m) (Extended Data Fig. 5); 
d,,is the upstream channel depth (m); S is the channel slope; and fis 
the channel aspect ratio. We estimate the aspect ratio and depth of 
each river based on its discharge following hydraulic geometry”. Q..4. 
has been tested for a broad selection of deltas globally and was found 
to be an appropriate indicator of tidal dominance in a broad range of 
wave environments”. 


Combining Quivers Qhide and Quave 

To estimate the location of deltas within a ternary diagram we deter- 
mine the fraction r of the total sediment flux contributed by waves, 
tides and the river 


Q 
Ose zy Oe x Quide 


= 


(4) 


wherex represents river, wave or tide. The relative sediment flux rcan 
vary between 0 and 1, whereas the river- and tidal-dominance ratios 
Rand Tvary between 1/~ and ~ (Fig. 1a, b). rallows us to uniquely posi- 
tion a river delta within the ternary diagram and characterize its two 
first-order morphological indicators, the delta protrusion angle and 
the channel width divergence. Similarly to wave, tide and river domi- 
nance, a delta is considered tide-dominated if Q,,4. exceeds both Q,iv., 
and Qu ave: BY ASSESSING Q,ivers Qrige ANA Qyave for all deltas globally, we find 
that 8,551 (79%) are wave-dominated, 1,170 (11%) are river-dominated 
and 1,127 (10%) are tide-dominated. 


Accuracy of delta morphology prediction 

To test our predictions of delta morphology, we analysed 212 deltas 
on Madagascar, supplemented by 100 deltas picked randomly from 
our dataset, and visually categorized them as river-, wave- or tide- 
dominated (Extended Data Table 2). Following Olofsson et al.**, we 
obtain prediction accuracies of 91%, 55% and 64%, for wave-, river- and 
tide-dominated deltas, respectively, which indicate the likelihood 
that any one particular delta is classified correctly (equation 2 in 
ref. **), By weighting by their occurrence, we obtain an overall accu- 
racy of 85% (+2%, determined through bootstrapping) (equation 4 in 
ref. +). By correcting for the size of the dataset, we obtain estimates of 
the 95% confidence interval of the global fraction of wave-, river- and 
tide-dominated deltas of 79% +9%, 11% + 2%, and 10% +3%, respectively 
(equation 11 in ref. **). 

We note that although the island of Madagascar has a large variety of 
coastal landforms, it isnot necessarily a good statistical representation 
of coastlines worldwide. Our morphological accuracy assessment is 
therefore biased, and we do not adjust the gross total proportion of river-, 
wave: or tide-dominated deltas on the basis of our visual assessment. 


Measurements of recent deltaic change 
We measure the deltaic surface area change by combining our dataset 
of river mouths and their associated deltas with surface-water changes 


between 1985 and 2015 mapped on a global scale by the Aqua 
Monitor”. To select the appropriate coastal change per delta we first 
determine delta extents along the NOAA vectorized shoreline dataset*. 
Next, we use an empirical approximation of the delta area™, 
~1.07(Q5\ o Qh. 47ve,)/Den(in square kilometres), where Q,,,iver is the river 
discharge and D,, is the shelf depth, here D,, = 100 m (ref.*). We obtain 
a delta radius (~(area/m)"”), set a minimum radius of 2 km for small 
deltas, and match every shoreline location within the radius of that 
particular delta (Extended Data Fig. 5). Using Google Earth Engine”, 
we then retrieve local surface-water changes along these deltaic coast- 
lines, summing land gain and land loss along the NOAA vectorized 
shorelines within a buffer equal to one-tenth of the delta radius 
(Extended Data Fig. 5). The NOAA shorelines include banks of wide 
coastal channels such as estuaries. By selecting only land area change 
near the NOAA shorelines, we exclude land—water conversion within 
delta interiors (away from channel banks and shorelines), for whichR 
and Tare not appropriate indicators. Land area change resulting from, 
for example, subsidence, tectonic activity, or delta plain engineering, 
is therefore probably not fully captured in our reported delta- area 
change. Land area change of abandoned delta lobes near active parts of 
the delta might be included. We note the potential for sizeable anthro- 
pogenic effects on land gain and land loss (for example, land reclama- 
tion), and therefore mask out portions of each delta that are classified 
as urban/artificial (class 190) areas by the GlobCover® dataset. 

We estimate the uncertainty in the land gain and land loss measure- 
ments by combining three sources of error. The first source of error 
lies in the per-pixel classification of water versus land. The Global 
Surface Water Explorer reports uncertainty of about 1% in their clas- 
sification®. The Aqua Monitor uses a similar classification algorithm 
and therefore probably has similar uncertainty. The second source of 
error is the categorization of changes in the water-to-land and land-to- 
water transition. We estimate this uncertainty by comparing deltaic 
land area changes between the AquaMonitor” and the Global Surface 
Water Explorer’, which use different algorithms to classify transitions. 
We obtain a covariance of 7%, which we include as a measure of the 
spatial uncertainty. 

A third source of uncertainty is the shoreline length and buffer 
assigned to every delta, and how much of the change within and out- 
side that area is related to delta morphodynamics. To quantify this 
uncertainty, we manually map the surface extents of 40 deltas in Mada- 
gascar and measure land surface changes within those deltas. A com- 
parison with automatically mapped areas yields a standard error of 1%. 
We combine the three sources of uncertainty and obtain a standard error 
of the mean of 9% per delta. The total net deltaic land area change +2s.d. 
for the 10,848 deltas in the dataset between 1985 and 2015 is 54 +12 km’. 

Aside from a global assessment, we also compare land gain rates 
of specific deltas to values reported by case studies in the literature 
(Extended Data Table 5). For the Mississippi Delta comparison, we 
therefore include land loss rates of the ‘birdfoot’ area closest to the 
river mouth, as well as the Breton Sound basin as defined by Couvillion 
et al.””. For the seven deltas considered, the global analysis seems to 
capture delta land loss and land gain in the same order of magnitude. 
Because the time periods and spatial coverages of these studies do not 
align, we use this only to illustrate similarities and differences between 
our reported land gain and earlier studies. 


Data availability 

All primary sources (OSU TOPEX®, NOAA WaveWatch”, USGS 
HydroSheds*’, USGS SRTM”’”, WBMSed” and AquaMonitor”’ data) 
are publicly available. Wave and tide data can also be found at 
https://jhnienhuis.users.earthengine.app. The resulting morphological 
predictions for all 10,484 deltas are available as .mat and .kml files at 
https://doi.org/10.17605/OSF.IO/S28QB. Source data for Figs. 1-3 are 
provided with the paper. 


Code availability 


The Matlab computer code that reproduces our findings is available 
at https://github.com/jhnienhuis/GlobalDeltaChange and https://osf. 
io/s28qb/. 
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Extended Data Fig. 1| Overview of the algorithm that identifies river deltas 
using HydroSheds data. a, HydroSheds drainage basins and the included 
deltas are shown for Veracruz, Mexico. b, Close-up ofa, showing the included 
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deltas and the tracked river channel for the channel slope calculation. Scale 
bars show the resolution of the WaveWatch” and TOPEX datasets”. 
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Extended Data Fig. 3 | WBMsed model predictions of human-induced change to the deltaic fluvial sediment flux. Colours indicate the ratio of the modern 
fluvial sediment flux (Q4,,,; here Qyiver gist) to the flux in a world without anthropogenic modifications* (Q?,,,,; here Quiver,prist)« 
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Extended Data Fig. 4 | Characterization of data used for wave- and tide-driven deltaic sediment flux. a, Global maximum potential alongshore sediment 
transport (Q,,,.) based on the WaveWatch 30-year hindcast data*”. b, Global estimate of mean tidal amplitude based on the OSU TOPEX data®. 
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Extended Data Fig. 5 | Example of recent deltaic land area change for the of human-induced increases in the fluvial sediment flux. The top image shows 
north shore of Java, Indonesia. Land loss and land gain were measured using the coastal change, with the red markers and black outlines representing 
Landsat (http://landsat.usgs.gov/) images from Google Earth Engine” based individual deltas and their coastlines, respectively. 
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Extended Data Table 1| Confusion matrix of the number of deltas on Madagascar 
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We note that the true-negative rate (no delta observed, no delta predicted) is infinite and therefore not included in our analysis. 
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Extended Data Table 2 | Confusion matrix of the delta morphologic prediction based on a validation dataset of 312 deltas 
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Extended Data Table 3 | Yearly deltaic land gain, loss and net gain for different regions 


Values represent averages from 1985 to 2015. Error limits indicate 2 s.d. 


Land gain Land loss Net land gain 

(km? yr“) (km? yr‘) (km? yr") 
Global 181+ 8.3 -127 +83 544118 
East Africa 6+16 -3+1.6 3423 
South Asia 42+1.7 -32417 10424 
West Africa 3413 3413 1+18 
Europe 10425 -3425 8+36 
Central America 2418 -1+18 125 
Russia 142.2 -142.2 0+3.1 
East Asia 3442.5 -22425 1143.6 
Northern Africa/Middle-East 541.2 -2+1.2 3418 
Eastern North America 6+2.2 -1142.2 -44+3.2 
Western North America 241.6 -2+1.6 0423 
Oceania 6+3.0 -5+3.0 1443 
Eastern South America 3342.0 -17+2.0 1642.9 
Western South America 361.2, -241.2 2418 
Southeast Asia 27+4.2 -2344.2 4459 


Article 


Extended Data Table 4 | Predicted sediment transport fluxes for a selection of well-known deltas 


Delta River Water Pristine Disturbed Wave-driven Tide-driven Net land gain 

Discharge Fluvial Fluvial Sediment Flux Sediment Flux (km? yr?) 

Qu river (M3 S$) Sediment Flux Sediment Flux Quave (kg 57) Quae (kgs) 
Q?.,,., (kg $1) Q*., (kg 5") 

Amazon 2.0E+5 3.8E+4 3.1E+4 2.9E-1 7.7E+5 1.0E+1 
Arno 5.7E+1 7.0E+1 1.0E+0 1.7E+2 1.4E-1 2.1E-2 
Colorado, MX 6.9E+2 3.8E+3 4.1E+0 2.9E+1 7.0E+3 -2.7E-1 
Copper 1.2E+3 2.2E+3 3.4E+2 7.7E+2 2.8E+3 -7.1E-2 
Danube 6.4E+3 2.1E+3 6.4E+2 2.3E+1 1.7E+1 3.7E-1 
Ebro 1.4E+3 5.8E+2 2.8E+1 3.5E+1 4.0E+0 -3.8E-1 
Eel, CA 2.4E+2 5.6E+2 7.5E+1 2.5E+3 1.5E+2 -1.7E-1 
Elbe 4.2E+2 4.9E+2 2.5E+2 9.8E+0 4.1E+6 -2.7E-4 
Ganges-Brahmaputra 3.1E+4 3.5E+4 3.5E+4 0.0E+0 2.0E+6 4.9E+0 
Godavari 2.7E+3 5.4E+3 5.2E+3 3.8E+2 1.0E+2 5.0E-1 
Huanghe 1.5E+3 3.5E+4 3.8E+3 2.3E+1 1.8E+1 -8.3E+0 
Klamath 4.7E+2 3.2E+2 1.5E+2 2.4E+3 1.9E+3 1.2E-2 
Lena 1.6E+4 6.3E+2 5.1E+3 1.2E+0 7.8E+2 7.1E-3 
Mekong 1.7E+4 3.1E+3 3.0E+3 3.3E+1 4.0E+5 -2.1E-1 
Mississippi 1.5E+4 1.3E+4 4.2E+3 1.0E+3 9.8E+2 -5.2E+0 
Niger 6.1E+3 1.3E+3 8.0E+2 6.1E+2 4.8E+3 -4.9E-2 
Nile 3.5E+3 3.8E+3 7.6E+1 2.2E+2 2.5E+2 -7.0E-1 
Orange 4.4E+2 2.8E+3 3.0E+2 2.9E+3 1.1E+1 2.4E-1 
Parana 1.5E+4 2.8E+3 2.5E+3 0.0E+0 9.0E+2 9.4E-1 
Po 1.5E+3 5.5E+2 3.0E+2 4.2E+1 4.9E+2 1.2E-1 
Rhine-Meuse 2.0E+3 2.0E+3 5.5E+2 1,.2E+2 1.7E+4 6.5E-1 
Rhone 1.7E+3 1.9E+3 5.6E+2 1.6E+2 7.9E+1 1.7E-1 
Sao Francisco 3.6E+3 2.5E+3 1.7E+3 4.4E+3 1.7E+3 2.0E-2 
Schelde 9.8E+1 1.9E+1 5.0E+0 6.8E+2 1.2E+0 3.2E-3 
Senegal 6.9E+2 5.6E+2 4.3E+2 5.0E+2 8.9E+3 -4.7E-2 
Volga 8.2E+3 6.0E+2 1.5E+3 0.0E+0 7.8E-1 3.8E+0 
Yangtze 2.8E+4 1.5E+4 9.0E+3 6.1E+1 2.0E+4 -2.7E+0 


See also Fig. 2b. 


Extended Data Table 5 | Comparison of net land gain estimates with case studies from the literature 


Delta Net land gain + 2 s.d. Net land gain Study Source Note 

(km? yr", this study) (km? yr7, other studies) period 
Ebro -0.440.2 -0.2 1957-1992 54 Based on shoreline transects 
Ganges - 4.9+0.2 12.3 1973-2016 55 Hatiya and Bhola districts 
Brahmaputra 
Ganges - 4940.2 0.4 1989-2009 56 Coastal Bangladesh 
Brahmaputra 
Huanghe -8.2+0.2 -4.0 1999-2011 57 Modern lobe 
Mekong -0.2+0.2 0.5 2003-2012 32 Delta distributary mouths 
Mississippi -5.2+0.2 -0.5 1985-2015 22 Birdfoot region 
Mississippi -5.2+0.2 -15.0 1985-2015 22 Breton Sound basin 
Nile -0.7+0.2 -0.2 1990-2014 58 
Parana 0.9+0.2 2.0 1995-2015 59 
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The origin of eukaryotes remains unclear’ *. Current data suggest that eukaryotes 
may have emerged from an archaeal lineage known as ‘Asgard’ archaea>*. Despite the 
eukaryote-like genomic features that are found in these archaea, the evolutionary 
transition from archaea to eukaryotes remains unclear, owing to the lack of cultured 
representatives and corresponding physiological insights. Here we report the decade- 
long isolation of an Asgard archaeon related to Lokiarchaeota from deep marine 
sediment. The archaeon—‘Candidatus Prometheoarchaeum syntrophicum strain 
MK-D1—is an anaerobic, extremely slow-growing, small coccus (around 550 nmin 
diameter) that degrades amino acids through syntrophy. Although eukaryote-like 
intracellular complexes have been proposed for Asgard archaea’, the isolate has no 
visible organelle-like structure. Instead, Ca. P. syntrophicum is morphologically 
complex and has unique protrusions that are long and often branching. On the basis 
of the available data obtained from cultivation and genomics, and reasoned 
interpretations of the existing literature, we propose a hypothetical model for 
eukaryogenesis, termed the entangle-engulf-endogenize (also known as E*) model. 


How the first eukaryotic cell emerged remains unclear. Among vari- 
ous competing evolutionary models, the most widely accepted are 
symbiogenic models in which an archaeal host cell and an alphapro- 
teobacterial endosymbiont merged to become the first eukaryotic 
cell’ *, Recent metagenomic characterization of deep-sea archaeal 
group/marine benthic group-B (also known as Lokiarchaeota) and 
the Asgard archaea superphylum led to the theory that eukaryotes 
originated from an archaeon that was closely related to these lin- 
eages”*. The genomes of Asgard archaea encode a repertoire of 
proteins that are only found in Eukarya (eukaryotic signature pro- 
teins), including those involved in membrane trafficking, vesicle 
formation and/or transportation, ubiquitin and cytoskeleton forma- 
tion®. Subsequent metagenomic studies have suggested that Asgard 
archaea have a wide variety of physiological properties, including 
hydrogen-dependent anaerobic autotrophy’, peptide or short-chain 
hydrocarbon-dependent organotrophy® ” and rhodopsin-based 
phototrophy®™. However, no representative of the Asgard archaea 
has been cultivated and, thus, the physiology and cell biology of 
this clade remains unclear. In an effort to close this knowledge gap, 
we successfully isolated an archaeon of this clade, report its physi- 
ological and genomic characteristics, and propose anew model for 
eukaryogenesis. 


Isolation of an Asgard archaeon 


Setting out to isolate uncultivated deep marine sediment microor- 
ganisms, we engineered and operated a methane-fed continuous-flow 
bioreactor system for more than 2,000 days to enrich such organisms 
from anaerobic marine methane-seep sediments» (Supplementary 
Note 1). We successfully enriched many phylogenetically diverse yet- 
to-be cultured microorganisms, including Asgard archaea members 
(Loki-, Heimdall- and Odinarchaeota)®. For further enrichment and 
isolation, samples of the bioreactor community were inoculated in glass 
tubes with simple substrates and basal medium. After approximately 
one year, we found faint cell turbidity in a culture containing casamino 
acids supplemented with four bacteria-suppressing antibiotics 
(Supplementary Note 2) that was incubated at 20 °C. Clone library- 
based small subunit (SSU) rRNA gene analysis revealed a simple com- 
munity that contained Halodesulfovibrio and a small population of 
Lokiarchaeota (Extended Data Table 1). In pursuit of this archaeon, 
which we designated strain MK-D1, we repeated subcultures when 
MK-D1 reached maximum cell densities as measured by quantita- 
tive PCR (qPCR). This approach gradually enriched the archaeon, 
which has an extremely slow growth rate and low cell yield (Fig. 1a). 
The culture consistently had a 30-60-day lag phase and required more 
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Fig.1| Growth curves and photomicrographs of the cultured Lokiarchaeota 
strain MK-D1.a, Growth curves of MK-D1in anaerobic medium supplemented 
with casamino acids (CA) alone; casamino acids with 20 amino acids (AAs) and 
powdered milk (PM); or peptone with powdered milk. Results are also shown 
for cultures fed with 10- and 100-fold dilution of casamino acids, 20 amino 
acids and powdered milk. b, c, Fluorescence images of cells from enrichment 
cultures after 8 (b) and 11 (c) transfers stained with DAPI (violet) and hybridized 
with nucleotide probes that target MK-D1 (green) and Bacteria (red). Pie charts 
show the relative abundance of microbial populations based onSSU rRNA 
gene-tag sequencing (iTAG) analysis. d, A fluorescence image of cells from 
enrichment cultures after 11 transfers hybridized with nucleotide probes that 
target MK-D1 (green) and Methanogenium (red). The FISH experiments were 
performed three times with similar results. e, SEM image of a highly purified 
co-culture of MK-D1and Methanogenium. White arrows indicate 
Methanogenium cells. We observed four different co-cultures with 
Methanogenium. Representative of n=40 recorded images. The detailed 
iTAG-based community compositions of cultures corresponding to each of 
the images are shown in Supplementary Table 1. Scale bars, 10 um (b,c) and 
5pm (d,e). 


than 3 months to reach full growth: around 10° 16S rRNA gene copies 
ml (Fig. 1a). The doubling time was estimated to be approximately 
14-25 days. Variation in cultivation temperatures (Extended Data Fig. 1), 
and substrate combinations and concentrations did not significantly 
shorten the lag phase or improve growth rate or cell yield (data not 
shown). Static cultivation supplemented with 20 amino acids and pow- 
dered milk resulted inthe stable growth. For further characterization, we 
cultured the archaeon under the optimal conditions determined above. 

After six transfers, MK-D1 reached 13% abundance in a tri-culture 
containing a Halodesulfovibrio bacterium (85%) anda Methanogenium 
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archaeon (2%) (Extended Data Table 1). Analyses using fluorescence 
in situ hybridization (FISH) and scanning electron microscopy (SEM) 
revealed a close physical association of the archaeon with the other 
microorganisms (Fig. lb-e, Extended Data Fig. 3 and Supplementary 
Table 1). Through metagenome-based exploration of the metabolic 
potential of this archaeon and a stable-isotope probing experiment, 
we discovered that MK-D1 can catabolize ten amino acids and pep- 
tides through syntrophic growth with Halodesulfovibrio and Metha- 
nogenium through interspecies hydrogen (and/or formate) transfer’® 
(Fig. 2, Extended Data Fig. 2 and Supplementary Tables 2-4). Indeed, 
addition of hydrogen scavenger-inhibiting compounds (that is, 10 mM 
molybdate and 2-bromoethanesulfonate for sulfate-reducing bacteria 
(SRB) and methanogens, respectively) significantly impaired growth 
of MK-D1. Through subsequent transfers, we were able to eliminate the 
Halodesulfovibrio population, enabling us to obtain a pure co-culture of 
the target archaeon MK-D1 and Methanogenium after a 12-year study— 
from bioreactor-based pre-enrichment of deep-sea sediments to a final 
7 years of in vitro enrichment. We here propose the name ‘Candidatus 
Prometheoarchaeum syntrophicum’ strain MK-D1 for the isolated 
archaeon (see Supplementary Note 3 for reasons why the provisional 
Candidatus status is necessary despite isolation). 


Cell biology, physiology and metabolism 


We further characterized MK-D1 using the pure co-cultures and highly 
purified cultures. Microscopy analyses showed that the cells were small 
cocci (approximately 300-750 nm in diameter (average, 550 nm)), and 
generally formed aggregates surrounded by extracellular polymer 
substances (EPS) (Fig. 3a, b and Extended Data Fig. 3), consistent with 
previous observations using FISH”. MK-D1 cells were easily identifi- 
able given the morphological difference from their co-culture partner 
Methanogenium (highly irregular coccoid cells of >2 ym; Fig. 1d, e). 
Dividing cells had less EPS and a ring-like structure around the cells 
(Fig. 3c). Cryo-electron microscopy (cryo-EM) and transmission 
electron microscopy (TEM) analyses revealed that the cells contain 
no visible organelle-like inclusions (Fig. 3d-f and Supplementary 
Videos 1-6), in contrast to previous suggestions*. For cryo-EM, cells 
were differentiated from vesicles on the basis of the presence of 
cytosolic material (although DNA and ribosomes could not be dif- 
ferentiated), EPS on the cell surface and cell sizes that were consist- 
ent with observations by SEM and TEM analyses (Supplementary 
Videos 4-6). The cells produce membrane vesicles (SO-280 nm in 
diameter) (Fig. 3b-f) and chains of blebs (Fig. 3c). MK-D1 cells also 
form membrane-based cytosol-connected protrusions of various 
lengths that have diameters of 80-100 nm, and display branching with 
a homogeneous appearance unlike those of other archaea (Fig. 3g-i; 
confirmed using both SEM and TEM). These protrusions neither form 
elaborate networks (as in Pyrodictium'®) nor intercellular connections 
(Pyrodictium, Thermococcus and Haloferax'*”°), suggesting differ- 
ences in physiological functions. The MK-D1 cell envelope may be com- 
posed of amembrane and a surrounding S-layer, given the presence 
of four genes that encode putative S-layer proteins (Supplementary 
Fig. 1), stalk-like structures on the surface of the vesicles (Fig. 3e and 
Extended Data Fig. 3f, g) and the even distance between the inner and 
outer layers of the cell envelope (Fig. 3d). Lipid composition analysis 
of the MK-D1and Methanogenium co-culture revealed typical archaeal 
isoprenoid signatures—C,,-phytane and C,,-biphytanes with 0-2 cyclo- 
pentane rings were obtained after ether-cleavage treatment (Fig. 3j). 
Considering the lipid data obtained from areference Methanogenium 
isolate (99.3% 16S rRNA gene identity; Supplementary Fig. 2), MK-D1 
probably contains C,,.-phytane and C,,-biphytanes with O-2 rings. 
The MK-D1 genome encoded most of the genes necessary to synthe- 
size ether-type lipids—although geranylgeranylglyceryl phosphate 
synthase was missing—and lacked genes for ester-type lipid synthesis 
(Supplementary Tables 3, 4). 
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Fig. 2| Syntrophic amino acid utilization of MK-D1.a, Genome-based 
metabolic reconstruction of MK-D1. Metabolic pathways identified (coloured 
or black) and not identified (grey) are shown. Foridentified pathways, each 
step (solid line) or process (dotted) is marked by whether it is oxidative (red), 
reductive (blue), ATP-yielding (orange) or ATP-consuming (purple). Wavy 
arrows indicate exchange of compounds: formate, H,, amino acids, vitamin B,,, 
biotin, lipoate and thiamine pyrophosphate (TPP), which are predicted to be 
metabolized or synthesized by the partnering Halodesulfovibrio and/or 
Methanogenium. Biosynthetic pathways are indicated witha yellow 
background. Metatranscriptomics-detected amino-acid-catabolizing 
pathways are indicated (black dots above amino acids). DHDH, 4,5-dihydroxy- 
2,6-dioxohexanoate; DHDG, 2-dehydro-3-deoxy-D-gluconate; DHDG6P, 
3-dehydro-3-deoxy-D-gluconate 6-phosphate; Ac-CoA, acetyl-CoA; uro, 
urocanate; Fo-Glu, formyl glutamate; CH,=H,F, methylene-tetrahydrofolate; 
CH=H,F, methenyl-tetrahydrofolate; Fo-H,F, formyl-tetrahydrofolate; 20B, 
2-oxobutyrate; Prop-CoA, propionyl-CoA; ACAC, acetoacetate; GB-CoA, 


MK-D1 can degrade amino acids anaerobically, as confirmed by 
monitoring the depletion of amino acids during the growth of pure 
co-cultures (Extended Data Fig. 1b, c). We further verify the utiliza- 
tion of amino acids by quantifying the uptake of a mixture of ?C- 
and &N-labelled amino acids through nanometre-scale secondary 
ion mass spectrometry (NanoSIMS) (Fig. 2b-e). Cell aggregates of 
MK-D1 incorporated amino-acid-derived nitrogen, demonstrating 
the capacity of MK-D1 to utilize amino acids for growth. Notably, the 
BC-labelling of methane and CO, varied depending on the methano- 
genic partner, indicating that MK-D1 produces both hydrogen and 
formate from amino acids for interspecies electron transfer (Extended 
Data Table 2). Indeed, addition of high concentrations of hydrogen 
or formate completely suppressed growth of MK-D1 (Extended Data 
Table 3). The syntrophic partner was replaceable—MK-D1 could also 
grow syntrophically with Methanobacterium sp. strain MO-MB1” 
instead of Methanogenium (Fig. 2b-e). Although 14 different culture 
conditions were applied, none enhanced the cell yield, which indicates 


AAs, vitamin B12, 
TPP 


_—______J 


y-amino-butyryl-CoA; But-CoA, butyryl-CoA; Fd, ferredoxin; XSH/X-S-S-X, 
thiol/disulfide pair; TCA, tricarboxylic acid cycle; PPP, pentose-phosphate 
pathway. b-e, NanoSIMS analysis of ahighly purified MK-D1 culture incubated 
witha mixture of ?C- and °N-labelled amino acids. b, Green fluorescent 
micrograph of SYBR Green |-stained cells. Aggregates are MK-D1, and 
filamentous cells are Methanobacterium sp. strain MO-MBI1 (fluorescence can 
be weak owing to the high rigidity and low permeability of the cell membrane 
(Extended Data Fig. 2m, n; see also ref. *”).c, NanoSIMS ion image of ’C (cyan). 
d, NanoSIMS ion image of ?C5N/“C™N (magenta). e, Overlay image of b-d. 

d, The colour bar indicates the relative abundance of N expressed as ®N/“N. 
Scale bars 5 um. The NanoSIMS analysis was performed without replicates 
due to its slow growth rate and low cell density. However, to ensure the 
reproducibility, we used two different types of highly purified cultures of 
MK-D1 (see Methods). Representative of n=8 recorded images. The iTAG 
analysis of the imaged culture is shown in Supplementary Table 1. 


specialization of the degradation of amino acids and/or peptides 
(Extended Data Table 3). 

To further characterize the physiology of the archaeon, we analysed 
the complete MK-D1 genome (Extended Data Fig. 2 and Supplemen- 
tary Tables 2-6). The genome only encodes one hydrogenase (NiFe 
hydrogenase MVhADG-HdrABC) and formate dehydrogenase (molyb- 
dopterin-dependent FdhA), suggesting that these enzymes mediate 
reductive H, and formate generation, respectively. MK-D1 represents, 
to our knowledge, the first cultured archaeon that can produce and 
syntrophically transfer H, and formate using the above enzymes. We 
also found genes encoding proteins for the degradation of ten amino 
acids. Most of the identified amino-acid-catabolizing pathways only 
recover energy through the degradation of a 2-oxoacid intermediate 
(thatis, pyruvate or 2-oxobutyrate; Fig. 2aand Supplementary Table 4). 
MK-D1 can degrade 2-oxoacids hydrolytically (through 2-oxoacid-for- 
mate lyases) or oxidatively (through 2-oxoacid:ferredoxin oxidoreduc- 
tases) to yield acyl-CoA intermediates that can be further degraded 
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Fig. 3 | Microscopy characterization and lipid composition of MK-D1. 

a-c, SEMimages of MK-D1. Single cell (a), aggregated cells covered with EPS-like 
materials (b) anda dividing cell with polar chains of blebs (c).d, Cryo-electron 
tomography image of MK-D1. The top-right inset image shows a magnification 
of the boxed area to show the cell envelope structure. e, Cryo-EM image of large 
membrane vesicles attached to and surrounding MK-D1 cells. f, Ultrathin 
section of an MK-D1 cell anda membrane vesicle. The bottom-right inset image 
shows a magnified view of the membrane vesicle. g, h, SEM images of MK-D1 
cells producing long branching (g) and straight (h) membrane protrusions. 

i, Ultrathin section ofa MK-D1 cell with protrusions.j, A total ion chromatogram 
of gas chromatography-mass spectrometry (GC-MS) for lipids extracted from 
ahighly purified MK-Di culture. The chemical structures of isoprenoids and 


for ATP generation. In the hydrolytic path, the carboxylate group of 
the amino acid is released as formate that can be directly handed off 
to partnering methanogenic archaea or SRB. In the oxidative path, 
2-oxoacid oxidation is coupled with release of amino acid carboxylate 
as CO, and reduction of ferredoxin, which can be re-oxidized through 
H* and/or CO, reduction to H, and formate, respectively (through the 
electron-confurcating NiFe hydrogenase MVhADG-HdrABC or formate 
dehydrogenase FdhA). On the basis of °C-amino-acid-based experi- 
ments (Supplementary Note 4), MK-D1 can probably switch between 
syntrophic interaction through 2-oxoacid hydrolysis and oxidation 
depending onthe partner(s). 

Etymology. Prometheoarchaeum, Prometheus (Greek): a Greek god 
who shaped humans out of mud and gave them the ability to create fire; 
archaeum from archaea (Greek): an ancient life. The genus name is an 
analogy between the evolutionary relationship this organism and the 
origin of eukaryotes, and the involvement of Prometheus in the origin 
of humans from sediments and the acquisition of an unprecedented 
oxygen-driven energy-harnessing ability. The species name, syntrophi- 
cum, syn (Greek): together with; trephein (Greek) nourish; icus (Latin) 
pertaining to. The species name refers to the syntrophic substrate 
utilization property of this strain. 

Locality. Isolated from deep-sea methane-seep sediment of the 
Nankai Trough at 2,533 m water depth, off the Kumano area, Japan. 
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their relative compositions are also shown (Supplementary Fig. 2). Scale bars, 
1pm (b,c, g,h),500 nm (a, d,e, i) and 200 nm (f).a-c, g,h, SEMimages are 
representative of n=122 recorded images that were obtained from four 
independent observations from four culture samples. d, e, Cryo-EM images are 
representative of n=14 recorded images that were taken from two independent 
observations fromtwo culture samples. f, i, The ultrathin section images are 
representative of n=131 recorded images that were obtained from six 
independent observations from six culture samples. White arrows inthe 
images indicate large membrane vesicles. The lipid composition experiments 
were repeated twice and gave similar results. Detailed iTAG-based community 
compositions of the cultures are shown in Supplementary Table 1. 


Diagnosis. Anaerobic, amino-acid-oxidizing archaeon, small coccus, 
around 550 nmin diameter, syntrophically grows with hydrogen- and 
formate-using microorganisms. It produces membrane vesicles, chains 
of blebs and membrane-based protrusions. 


Extant and ancestral features 


The evolutionary relationship between archaea and eukaryotes has 
been under debate, hinging on the incompleteness and contamination 
associated with metagenome-derived genomes and variation in results 
that depend ontree-construction protocols”. By isolating MK-D1, we 
were able to obtain a closed genome (Extended Data Fig. 2 and Supple- 
mentary Table 2) and construct ribosomal protein-based phylogenomic 
trees that show clear a phylogenetic sister relation between MK-D1land 
Eukarya (Fig. 4a, Extended Data Fig. 4 and Supplementary Tables 7, 8). 
Thus, MK-D1 represents the closest cultured archaeal relative of eukary- 
otes. We confirmed the presence of 80 eukaryotic signature proteins, 
which are also observed in related Asgard archaea (Supplementary 
Figs. 3-13 and Supplementary Tables 3, 9). Moreover, RNA-based evi- 
dence for expression of such genes was obtained. Among eukaryotic 
signature proteins, 23 fall in the 500 most highly expressed genes, 
including hypothetical proteins related to actin, gelsolin, ubiquitin, 
ESCRT-III proteins (Vps2/24/46-like and Vps20/32/60-like), Roadblock/ 
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Fig. 4| Phylogeny of MK-D1 and catabolic features of Asgard archaea. 

a, Maximum-likelihood tree (100 bootstrap replicates) of MK-D1and select 
cultured archaea, eukaryotes and bacteria based on 31 ribosomal proteins that 
are conserved across the three domains (Supplementary Tables 7, 8). Bootstrap 
values around critical branching points are also shown. We used 14,024 sites of 
the alignment for tree construction. b, The presence or absence of amino acid 
degradation, electron metabolism, fermentation, C1 metabolism, sulfur 
metabolism and aerobic respiration in individual genomes are shown 
(complete pathway, full circle; mostly complete pathway, half circle). For amino 
acid metabolism, pathways that are exclusively used for catabolism or 


LC7 domain proteins and small GTP-binding domain proteins (Supple- 
mentary Tables 3, 9). Notably, MK-D1 simultaneously expresses three 
systems that could potentially contribute to cell division (FtsZ, actin 
and ESCRT-II/III; Supplementary Table 3). 

Given the phylogenetic relationship of MK-D1, other Asgard archaea 
and eukaryotes, estimating the physiological traits of the last Asgard 
archaeacommonancestor is of utmost importance. Comparative genom- 
ics between MK-D1 and published metagenome-assembled genomes 
of Asgard archaea revealed that most of the members encode amino- 
acid-catabolizing pathways, NiFe hydrogenases (MvhADG-HdrABC”° 
and/or HydAD”’) (Fig. 4b), and have restricted biosynthetic capacities 
(that is, amino acid and vitamin synthesis; Extended Data Fig. 5), indicat- 
ing that H,-evolving amino acid degradation and partner dependence 
may be acommon feature across the superphylum. Like MK-D1, other 
members of the Asgard archaea possess enzymes associated with syn- 
trophic bacteria (the electron transfer complex FIXABCD-HdrABC”’ and 
formate dehydrogenases), indicating that other archaea have the capac- 
ity to degrade amino acids syntrophically. Many lineages also possess 
genes for fermentative propionate and/or butyrate production (Fig. 4b). 
Various other unique types of metabolism can be identified (for example, 
mono/tri-methylamine-driven homoacetogenesis and coupled H,/S° 
metabolism in Thorarchaeota; H,S metabolism in Heimdallarchaeota; 
other types have been reported by other studies® **""*), but are either 


degradation are in bold. Glycine metabolism through pyruvate (top) or formate 
(bottom). Butyrate metabolism is reversible (fermentation or B-oxidation); 
however, butyryl-CoA dehydrogenases tend to be associated with EtfAB in the 
genomes, suggesting formation of an electron-confurcating complex for 
butyrate fermentation. Propionate was determined by the presence of 
methylmalonyl-CoA decarboxylase, biotin carboxyl carrier protein and 
pyruvate carboxylase. Propionate metabolism is also reversible; however, no 
member of the Asgard archaea encodes the full gene set for syntrophic 
degradation. Alcohol dehydrogenases can have diverse substrate specificities. 
See Supplementary Note 5 for abbreviations. 


only sporadically present or confined to specific phylum-level lineages. 
Toidentify potential ancestral features, we searched for catabolic genes 
that are conserved across phylum-level lineages including Heimdallar- 
chaeota (currently the most deep-branching Asgard archaea) that form 
monophyletic clusters. We found key catabolic genes for histidine, serine 
and threonine degradation (urocanate hydratase and serine/threonine 
dehydratase; Extended Data Figs. 6, 7), butyrate fermentation (fatty- 
acid-CoA ligase and 3-ketoacyl-CoA thiolase; Supplementary Figs. 14, 15) 
and propionate fermentation (succinate dehydrogenase flavopro- 
tein subunit, methylmalonyl-CoA transcarboxylase-associated biotin 
ligase and biotin carboxyl carrier protein; Supplementary Figs. 16-18). 
Given the physiology of the isolated MK-D1; the presence of amino acid 
catabolism and H, metabolism and the lack of biosynthetic pathways in 
nearly all extant Asgard archaea lineages; and conservation of the above 
metabolism types, we propose that the last Asgard archaea common 
ancestor was an amino-acid-degrading anaerobe that produced H, and 
fatty acids as by-products, acquired ATP primarily from substrate-level 
phosphorylation through catabolizing 2-oxoacid intermediates and 
depended on metabolic partners, although we do not reject the pos- 
sibility of other additional lifestyles. In summary, we provide evidence 
that Asgard archaea are capable of syntrophic degradation of amino 
acids, are dependent on symbiotic interactions for both catabolism 
and anabolism (for example, H,, formate and metabolite transfer) and 
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Fig. 5 | Proposed hypothetical model for eukaryogenesis. a, The syntrophic/ 
fermentative host archaeon is suggested to degrade amino acids to short-chain 
fatty acids and H,, possibly by interacting with H,-scavenging (and indirectly 
O,-scavenging) SRB (orange; see Supplementary Note 6). b, The host may have 
further interacted witha facultatively aerobic organotrophic partner that 
could scavenge toxic O, (the future mitochondrion; red). Continued 
interaction with SRB could have been beneficial but not necessarily essential; 
dotted arrows indicate the interaction; see Supplementary Note 7. c, Host 
external structures could have interacted (for example, mechanical or 


conserve related fermentative metabolic features across the super- 
phylum, suggesting that the ancestor of the Asgard archaea possessed 
such capacities. This shows some congruence with a previous study that 
proposes hydrogenogenesis as a feature of the ancestor”, but differs in 
several central features. 


Newinsights into eukaryogenesis 


The origin of the eukaryotic cell is one of the most enigmatic ques- 
tions in biology. Isolation and cultivation of MK-D1 brings us closer 
to understanding how eukaryotes may have emerged; however, it is 
important to emphasize that the vast amount of time (roughly 2 billion 
years) that separates this modern-day organism from the organism 
that evolved into the last eukaryotic common ancestor (LECA) leaves 
many uncertainties—although we can make reasoned assumptions 
on the events that may have occurred during the course of evolution. 
The discussion that follows is a hypothetical model, in which we build 
on existing hypotheses with extrapolations from the insights gained 
in this study; notably, the model is not definitive and more studies on 
Asgard archaea and other deep-branching eukaryotes are required to 
contextualize the most probable steps that occurred. 

Assuming that the ancestor of the Asgard archaea was indeed syn- 
trophic, internally simple (that is, similar to MK-D1) and inhabited 
anaerobic marine sediments as most of the extant members of this 
lineage do®, evolution towards the facultatively aerobic LECA”’ canbe 
envisioned to require (1) transition from anaerobiosis to aerobiosis, (2) 
the gain of an O,-respiring and ATP-providing endosymbiont (that is, 
mitochondrion), and (3) development of intracellular structures. As 
Earth’s O, levels®° had begun to rise before the evolution of the LECA 
(the TACK-Asgard archaea lineage dated to approximately 2.1-2.4 bil- 
lion years ago”), we work on the assumption that the archaea needed 
to accommodate the increasing O, levels, and energy and organic sub- 
strate inputs, especially in benthic habitats of shallow oceans. Aero- 
tolerance might have been conferred by asymbiotic interaction with 
facultative O,-respiring organisms*™, which was potentially followed 
by endosymbiosis of one of these aerobes (that is, the future mito- 
chondrion). Although such a transition from syntrophy to aerobiosis 
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biological fusion®) with the aerobic partner to enhance physical interaction 
and further engulf the partner for simultaneous development of 
endosymbiosis and a primitive nucleoid-bounding membrane. d, After 
engulfment, the host and symbiont could have continued the interaction 
showninbasa primitive type of endosymbiosis. e, Development of ADP/ATP 
carrier (AAC) by the endosymbiont (initial direction of ATP transport remains 
unclear; see Supplementary Note 9). f, Endogenization of partner symbiosis by 
the host through delegation of catabolism and ATP generation to the 
endosymbiont and establishment of asymbiont-to-host ATP channel. 


is non-trivial, we suggest that a syntrophic interaction with SRB could 
have mediated this (Fig. 5a, band Supplementary Notes 6, 7). Although 
previous models propose that H, transfer was a key interaction that 
drove endosymbiosis””°>”°, we believe that current data favours the 
above interaction (see Supplementary Note 8). Given the small cell size 
of MK-D1and the proposed lack of sufficient machinery” and energy’, 
we suggest that the physical manifestation of this endosymbiosis was 
probably independent of phagocytosis°. The observed morphology 
of strain MK-D1 rather points to a previously proposed alternative 
route” in which the host archaeon engulfed the metabolic partner 
using extracellular structures and simultaneously formed a primitive 
chromosome-surrounding structure that is topologically similar to the 
nuclear membrane; however, further evidence is required to support 
this conjecture (Fig. 5c, d). 

After engulfment, the host may have shared amino-acid-derived 
2-oxoacids with the endosymbiont as energy sources (Fig. 5d), given 
that amino-acid-degrading pathways widely encoded by Asgard 
archaea primarily recover ATP from 2-oxoacid degradation (Fig. 4b) 
and extant eukaryotes and mitochondria share 2-oxoacids*®. Inreturn, 
the endosymbiont may have consumed O, (as proposed previously*’) 
and provided the host with an intracellular pool of biological building 
blocks (for example, amino acids and co-factors that the host may not 
have been able to synthesize that were released passively or through 
endosymbiont death). On the basis of the absence of host-derived 
(that is, archaea-related) anaerobic 2-oxoacid catabolism genes (for 
example, ferredoxin-dependent 2-oxoacid oxidoreductase and NiFe 
hydrogenases) in eukaryotes*”’, the host presumably lost these dur- 
ing evolution towards the LECA. Notably, this loss might have conse- 
quently helped to simultaneously resolve catabolic redundancy (that 
is, 2-oxoacid catabolism in both host and symbiont) and O, sensitivity 
(thatis, O, inactivates these enzymes**’), For the resulting delegation 
of 2-oxoacid catabolism (and thus ATP generation) to the endosymbiont 
(as in modern mitochondria) to succeed, an ATP transport mechanism 
would have been necessary. Consistent with this notion, evolution of 
the ATP transporter (that is, the ADP/ATP carrier*’) is thought to have 
been instrumental in fixing the symbiosis** (see Supplementary Note 
9 for potential impetus; Fig. Se). Another transition may have been 


necessary—the host archaeon may have possessed ether-type lipids as 
observed for MK-D1 (Fig. 3j) and Asgard archaea”, yet all extant eukary- 
otes use ester-type lipids. However, a recent study showed that lipid 
types can mix without losing membrane integrity*, suggesting that the 
simple replacement of host ether-type lipids with ester-type lipids may 
have been possible (Fig. 5e). This hypothetical evolutionary scenario 
may have provided the steps that are required for the emergence of 
an aerobic organotroph that possess an O,-respiring ATP-generating 
endosymbiont congruent with extant eukaryotes and their mitochon- 
dria in terms of energy metabolism (Fig. Sf). 

In summary, we have isolated and cultivated the closest archaeal 
relative of eukaryotes to date that has a unique metabolism and mor- 
phology, and combining these observations with genomic analyses, 
propose the entangle—-engulf-endogenize model as one of several 
conceivable scenarios to explain the emergence of eukaryotes. Fur- 
ther investigation of MK-D1, related Asgard archaea and more deep- 
branching eukaryotes is now required and can provide valuable insights 
into the timing and progression of lateral gene transfer, endosymbiont 
organellogenesis towards the first mitochondrion and the formation of 
the endomembrane system (among many other physiological features). 
Such endeavours are essential to refine our understanding of the pos- 
sible chain of events that led to the eukaryotic cell, and to provide the 
necessary data that support or refute our models of eukaryogenesis. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 


Sampling site and sample description 

A 25-cm long sediment core (949C3) was collected from a methane- 
seep site at the Omine Ridge, Nankai Trough, off the Kumano area, 
Japan (33° 7.2253’ N, 136° 28.6672’ E), 2,533 m below sea level, by the 
manned submersible RV Shinkai 6500 (cruise YK06-03, dive no. 6K949, 
6 May 2006). The detailed sediment core sample and site information 
has been published previously®*!”. Our previous geochemical and 
16S rRNA gene analysis indicated that the occurrence of anaerobic 
oxidation of methane reactions was mediated by archaeal anaerobic 
methanotrophs in the sediment”. The SSU rRNA gene analysis also 
showed that the sediment contained abundant and diverse microorgan- 
isms, most of which were affiliated with uncultured microbial groups, 
including Asgard archaea’, 


Culturing 

The deep-sea methane-seep sediment sample was first enriched using 
a continuous-flow bioreactor system supplemented with methane as 
the major energy source. The bioreactor, called a down-flow hanging 
sponge (DHS) bioreactor, has been operated in our laboratory, JAM- 
STEC, Yokosuka Headquarters, since 28 December 2006. The detailed 
operation conditions for the DHS bioreactor have been described 
previously”. To isolate anaerobic microorganisms, including Asgard 
archaea, from the DHS reactor, 2-ml samples of the bioreactor enrich- 
ment sediment slurry were inoculated in 15-ml glass tubes with a simple 
substrate and a basal medium. The composition of the basal medium 
was almost similar to that used for cultivation in the DHS bioreactor’, 
but it did not contain sulfate (that is, Na,SO,). The basal medium com- 
position was as follows (per litre): 9.47 g MgCl,-6H,0, 1.36 g CaCl,:2H,O, 
20.7 g NaCl, 0.54 g NH,Cl, 0.14. g KH, PO,, 2.7 g NaHCO,, 0.3 g Na,S-9H,0, 
0.3 g cysteine-HCl, 1 ml trace element solution’, 1 ml Se/W solution, 
2ml vitamin solution’ and resazurin solution (Img ml"). The medium 
was purged with N,/CO, gas (80:20, v/v), and the pH was adjusted to 7.5 
at 25 °C. The culture tubes were sealed with butyl rubber stoppers and 
screw caps. Autoclaved or filter-sterilized organic substances (such as 
protein-derived materials, sugars and fatty acids) were added to the 
tubes with stock solutions before inoculation with the bioreactor- 
enriched community. After establishing a stable Ca. P. syntrophicum 
culture, cultivations were performed at 20 °C in 50-ml serum vials 
containing 20 ml basal medium supplemented with casamino acids 
(0.05%, w/v), 20 amino acids (0.1 mM each) and powdered milk (0.1%, 
w/v, Hohoemi, Meiji) under an atmosphere of N,/CO, (80:20, v/v) in 
the dark without shaking, unless mentioned otherwise. Information 
regarding the purity check of MK-D1 cultures, as well as additional infor- 
mation about cultivation, is included in the Supplementary Methods. 


SSU rRNA gene-based analysis 

DNA extraction and PCR mixture preparation were performed ona 
clean bench to reduce contamination. DNA extraction from culture 
samples was performedas described previously. The concentration of 
extracted DNA was measured using a Quant-iT dsDNA High-Sensitivity 
Assay Kit (Life Technologies). PCR amplification was performed using 
the Takara Ex Tag (for conventional clone analysis) or Takara LA Taq (for 
Illumina-based amplicon sequencing (iTAG) for targeted sequencing for 
the SSU rRNA gene analysis) (Takara Bio), and the reaction mixtures for 
PCR were prepared according to the manufacturer’s instructions. For 
the conventional clone analysis, a universal primer pair 530F/907R™ and 
anarchaeal primer pair 340F/932R”™ were used for PCRamplification. 
ForiTAG analysis, the universal primer pair 530F/907R, which contained 
overhang adapters at the 5’ ends, was used. The procedures used for 


library construction, sequencing and data analysis were described 
previously”. 


Growth monitoring using qPCR 

For the quantitative analysis, a StepOnePlus Real-Time PCR System 
(Thermo Fisher Scientific) with a SYBR Premix Ex Taq II kit (TaKaRa 
Bio) was used. The candidate phylum Lokiarchaeota-specific primer 
pair MBGB525F/Ar912r was used for amplification of 16S rRNA genes. 
Primer MBGBS25F is the complementary sequence of the MGBG525 
probe”, whereas Ar912r is an archaeal universal primer that is a slightly 
modified version of the originally designed primer®. The detailed 
procedure for qPCR is described in the Supplementary Methods. The 
doubling times of MK-D1 were calculated based onthe semi-logarithmic 
plot of the qPCR data. 


Growth test with multiple substrates 

To examine the effect of the presence of other substances on the growth 
of MK-D1, medium containing casamino acids, 20 amino acids, pow- 
dered milk and supplemented with an individual substrate (Extended 
Data Table 3) was prepared, followed by qPCR and iTAG analyses. Each 
cultivation condition was set in duplicate; however, the H,-fed cul- 
ture was prepared in triplicate because a previous study’ reported 
that a Lokiarchaeum has potential to grow with hydrogen based ona 
comparative genome analysis. Detailed culture liquid sampling and 
the subsequent qPCR and iTAG analyses are described in the Supple- 
mentary Information. 


Evaluation of growth temperature 

The test was performed using a basal medium containing casamino 
acids and powdered milk, with a pure co-culture of MK-D1 and Metha- 
nogenium as the inoculum (20%, v/v). The cultures were incubated 
at 4,10, 15, 20, 25, 30, 37 and 40 °C. All incubations for the test were 
performed in triplicate. After 100 days of incubation, 16S rRNA gene 
copy numbers of MK-D1 were evaluated using qPCR. 


FISH 

Fixation of microbial cells, storage of the fixed cells and standard FISH 
were performed in accordance with a previously described protocol”. 
The 16S rRNA-targeted oligonucleotide probes used in this study are 
listed in Supplementary Table 10. The design of MK-D1-specific probes 
is described in the Supplementary Methods. As clear fluorescent sig- 
nals were not obtained using the standard FISH technique, we used 
anin situ DNA-hybridization chain reaction (HCR) technique”. The 
FISH samples were observed using epifluorescence microscopes (BX51 
or BX53, Olympus) and a confocal laser scanning microscope (Nikon 
A1RMP, Nikon Instech). 


SEM 

Microbial cells were fixed overnight in 2.5% (w/v) glutaraldehyde inthe 
casamino acids-20 amino acid medium at 20 °C. The sample prepara- 
tion procedure has been described previously™*. The cell samples were 
observed using field emission-SEM (JSM-6700F, JEOL) or extreme high- 
resolution FIB-SEM (Helios G4 UX, ThermoFisher Scientific). 


Ultrathin sectioning and TEM 

Cells were prefixed with 2.5% (w/v) glutaraldehyde for 2 h. The speci- 
mens were frozen in a high-pressure freezing apparatus (EM-PACT2, 
Leica)°’. The frozen samples were substituted with 2% OsO, inacetone 
for 3-4 days at -80 °C, and the samples were warmed gradually to room 
temperature, rinsed with acetone embedded in epoxy resin (TAAB). 
Thin sections (70 nm) were cut with am ultramicrotome (EM-UC7, 
Leica). Ultrathin sections of the cells were stained with 2% uranyl acetate 
and lead-stained solution (0.3% lead nitrate and 0.3% lead acetate, 
Sigma-Aldrich), and were observed using TEM (Tecnai 20, FEI) at an 
acceleration voltage of 120 kV. 


Cryo-EM 
Owing to the low cell yield culture, 400 ml of the culture of MK-D1 was 
prepared and concentrated to about 5 ml using a 0.22-"m-pore-size 
polyethersulfone filter unit (Corning) in an anaerobic chamber (95:5 
(v/v) N:H, atmosphere; COY Laboratory Products). The concentrated 
culture liquid was placed ina glass vial in the anaerobic chamber. After 
that, the head space of the glass vial was replaced by N,/CO, gas (80:20, 
v/v). Immediately before the observation using electron microscopy, 
the glass vial was opened, and the liquid culture was concentrated to 
about 200 ul by centrifugation at 20,400g for 10 min at 20 °C. Sub- 
sequently, 3 pl of the concentrated liquid culture was applied onto a 
Quantifoil Mo grid R1.2/1.3 (Quantifoil MicroTools) pretreated with 
glow-discharge, and was plunged-frozen in liquid ethane using a Vit- 
robot Mark IV (FEI Company) at 4 °C and 95% humidity. 
Thefrozengrid was mounted ontoa914liquid-nitrogencryo-specimen 
holder (Gatan) and loaded into a JEM2200FS electron microscope 
(JEOL) equipped with a field emission electron source operating at 
200 kV and an omega-type in-column energy filter (slit width: 20 eV). 
The images were recorded on a DE-20 direct detector camera (Direct 
Electron) at anominal magnification of 15,000~, which resulted in 
an imaging resolution of 3.66 A per pixel, with the total dose under 
20 electrons per A*using a low-dose system. For electron tomography, 
tilt series images were collected manually ina range of approximately 
+62° at 2° increments. The total electron dose on the specimen per tilt 
series was kept under 100 electrons per A*to minimize radiation dam- 
age. The tilt series were aligned using gold fiducials and tomograms 
were reconstructed using filtered back projection or SIRT inthe IMOD 
software” with an image binning of 5. 


Lipid analysis 

About 120 ml of a highly purified culture sample was concentrated 
using the same method as described above, except that the filtration 
concentration procedure was performed on a clean bench instead of 
the anaerobic chamber. After cell collection, the cells were washed 
with the anaerobic basal medium to eliminate the interfering matrix. 
Subsequently, lipid analysis was conducted for the collected cells after 
theimproved method“. For precise qualitative liquid analysis, GC-MS 
was conducted on the 7890 system (Agilent Technologies) to compare 
the retention time and mass fragmentation signatures. 


Stable isotope probing and NanoSIMS analysis 

To confirm utilization of amino acids by MK-D1, astable-isotope prob- 
ing experiment was performed using a °C- and *N-labelled amino acid 
mixture (Cambridge Isotope Laboratories). In brief, 120 ml serum vials 
containing 40 ml basal medium were prepared and supplemented with 
the 20 stable-isotope-labelled amino acids (roughly 0.1 mM of each), 
casamino acids (0.05%, w/v) and non-labelled 20 amino acid mixture 
(0.1mM of each). Two types of highly purified cultures of MK-D1 were 
used as inocula: a co-culture with Methanobacterium sp. strain MO-MB1 
and atri-culture with Halodesulfovibrio and Methanogenium. The vials 
were incubated at 20 °C in the dark without shaking for 120 days. A 
reference cultivation was also performed under the same cultivation 
conditions without the addition of the 20 stable-isotope-labelled amino 
acid mixture (Extended Data Table 2). The detailed sample preparation 
and analysis method using NanoSIMS is described in the Supplemen- 
tary Methods. 


Chemical analysis 

The stable carbon isotope compositions of methane and CO, in the 
sampled gas phase were analysed as described previously”. Methane 
concentrations were measured by GC (GC-4000, GL Science) using a 
Shincarbon ST 50/80 column (1.0 m x 3.0 mm inner diameter; Shinwa 
Chemical Industries) and a flame ionization detector with nitrogen 
as a carrier gas. 


Amino acid concentrations in pure co-cultures of MK-D1 and Metha- 
nogenium were quantified througha previously described method®™. 
In brief, we processed the acid hydrolysis with 6 M HCI (110 °C, 12 h) 
for the culture liquid samples after filtration using a 0.2-tum pore-size 
polytetrafluoroethylene filter unit (Millipore). The amino acid frac- 
tion was derivatized to N-pivaloyl iso-propyl esters before GC using a 
6890N GCinstrument connected to the nitrogen phosphorus and flame 
ionization detectors (Agilent Technologies). For cross-validation of 
qualitative identification of amino acids, GC-MS on the 7890 system 
(Agilent Technologies) was used”. 


Genome sequencing and assembly 

DNA extraction was performedas described previously®. Mate-paired 
library with an average insert size of 3,000 bp was constructed accord- 
ing to the manufacturer’s instructions with Nextera Mate Pair Library 
Preparation kit (Illumina). Library sequencing was performed using 
Illumina MiSeq platform (2 x 300 bp), which resulted in 3,822,290 
paired reads. The mate pair reads were processed as follows: adapters 
and low-quality sequences were removed using Trimmomatic v.0.33° 
(ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:true LEADING:3 TRAILING:3 
SLIDINGWINDOW:4:20 MINLEN:100), and the linker sequences were 
removed using NextClip v.1.3.1%. De novo assembly was performed 
using SPAdes v.3.1.1% with multiple k-mer sizes (21, 33, 55, 77 and 99), 
which resulted in 3,487 contigs with lengths >500 bp, totalling up to 
14.68 Mb. The software MyCC” was used with default parameters for 
binning based on genomic signatures, marker genes and contig cover- 
ages. As heterogeneity in the sequence can cause highly fragmented 
or redundant contigs, the ambiguous contigs (sequence coverage <5 
oralength<1kb) and redundant contigs were discarded from binning. 
This resulted in the recovery of genomes related to Lokiarchaeota (that 
is, Ca. P. syntrophicum MK-D1, 4.46 Mb), Halodesulfovibrio (4.13 Mb) 
and Methanogenium (2.33 Mb). Scaffolds for each bin were constructed 
using SSPACE v.3.0® with mate-paired information of Illumina reads. 
To obtain the complete genome sequence of Ca. P. syntrophicum, the 
gaps were filled using Sanger sequencing. Genomes were annotated 
using Prokka v.1.12® and manually curated. The curation involved 
functional domain analysis through CD-Search (CDD v.3.17) with its 
corresponding conserved domain database” and InterProScan v.57; 
signal peptide and transmembrane domain prediction through SignalP 
v.4.13; carbohydrate-active enzyme, peptidase and lipase prediction 
through dbCAN v.5.0“, MEROPS” and lipase engineering database”; 
and hydrogenase annotation with assistance from HydDB”. In addition, 
to further verify the function, we compared the sequence similarity of 
each gene to enzymes found in UniProtKB/SwissProt that had experi- 
mentally verified catalytic activity and genes with extensive genetic, 
phylogenetic and/or genomic characterizations’”” with a 40% amino 
acid similarity cut-off. For enzymes that have divergent functions even 
with a 40% similarity cut-off (for example, [FeFe] and [NiFe] hydroge- 
nases, 3-oxoacid oxidoreductases, glutamate dehydrogenases and 
sugar kinases), phylogenetic trees were constructed with reference 
sequences to identify association of the query sequences to phyloge- 
netic clusters containing enzymes with characterized catalytic activity. 
Publicly available metagenome-assembled genomes of Asgard archaea 
were annotated in the same manner. 


Phylogenetic analysis 

Phylogenomic trees of MK-D1and select cultured archaea, eukaryotes 
and bacteria were calculated. Thirty-one ribosomal proteins conserved 
across the three domains (Supplementary Table 7) were collected from 
MK-D1, the organisms shown in the tree and metagenome-assembled 
genomes (MAGs) of uncultured archaeal lineages (Supplementary 
Table 8). Two alignments were performed in parallel: (1) only including 
sequences from cultured organisms and (2) also including MAG-derived 
sequences. MAFFT v.7 (--linsi) was used for alignment in both cases®°. 
For the latter, MAG-derived sequences were included to generate an 
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alignment that maximizes the archaeal diversity that is taken into 
account, but removed for subsequent tree construction to avoid any 
influence of contamination (that is, concatenation of sequences that 
do not belong to the same organism). ‘Candidatus Korarchaeum’ 
sequences were kept in the tree based on the cultured + uncultured 
alignment due to its critical position in TACK phylogeny. After remov- 
ing all-gap positions and concatenation, the maximum-likelihood 
trees were constructed using RAXML-NG v.0.8.0®% (fixed empirical 
substitution matrix (LG), 4 discrete GAMMA categories, empirical 
amino acid frequencies and 100 bootstrap replicates) and the Bayes- 
ian inference phylogenies were calculated using MrBayes v.3.2.7a™ 
(four chains, print/sample frequencies of 100, a relative burn-in of 25% 
(nchains = 4 nruns = 2 printfreq = 100 samplefreq = 100), LG model, 
invariable sites plus GAMMA models of rate variation across sites (prset 
aamodellpr = fixed(lg); Iset rates = invgamma)). For 16S ribosomal 
RNA phylogeny, sequences were aligned using SINA® against the Silva 
v.132 alignment*. The maximum-likelihood tree was calculated using 
RAxML® using the same parameters as RAXML-NG. 

For analysis of urocanate hydratase, serine/threonine dehydratase, 
succinate dehydrogenase flavoprotein, fatty-acid-CoA ligase and 
3-ketoacyl-CoA thiolase, homologues were collected through BLASTp® 
analysis of the Asgard archaea sequences against the UniProt data- 
base (release 2019_05). Asgard archaea protein sequences unavail- 
able in GenBank or UniProt (that is, those without accession numbers 
in the trees) were predicted with Prokka v.1.13° (--kingdom Archaea 
--rnammer) using the genome assemblies available in GenBank. Of 
homologues with sequence similarity >40% and overlap >70%, repre- 
sentative sequences were selected using CD-HIT v.4.8.1” witha cluster- 
ing cut-off of 70% similarity (default settings otherwise). Additional 
homologues with verified biochemical activity, sequence similarity 
>30%, and overlap >70% were collected through BLASTp® analysis of 
the Asgard archaea sequences against the UniProt/SwissProt database 
(2019_05)*°. Sequences were aligned using MAFFT v.7® with default 
settings (or MUSCLE v.3.8.31® where noted) and trimmed using trimAl 
v.1.2” (settings are specified in the caption for each corresponding 
phylogenetic tree). RAXML-NG®™ was used for tree construction with 
the same parameters above (or PhyML v.3.3” with 100 bootstrap rep- 
licates, LG model and empirical amino acid frequencies where noted). 
For analysis of biotin ligase and biotin carboxyl carrier protein, the 
phylogenetic tree was constructed using FastTree” using the LG model 
and 1,000 bootstrap replicates. 


RNA-based sequencing analysis 
To perform RNA-based sequencing analysis, 100 ml of culture liquid 
was prepared from 5 highly purified cultures that were incubated with 
casamino acids, 20 amino acids and powdered milk for about 100 days 
at 20 °C. Before RNA extraction, the growth of MK-D1 was confirmed 
using qPCR, and the cells density levels were around 10° copies mIin 
each culture. 

To collect microbial cells, the culture liquid was filtered through 
a 0.22-"um pore-size mixed cellulose ester membrane filter 
(GSWPO1300, Merck MilliPore) on a clean bench. After filtration, 
the membrane was cut in half with sterilized scissors and then directly 
inserted into the PowerBiofilm bead tubes of a PowerBiofilm RNA 
Isolation kit (MO BIO Laboratories). The following RNA extraction 
procedures were performed according to the manufacturer’s instruc- 
tions. The extracted RNA was applied to an RNA Clean & Concentra- 
tor Kit-5 (Zymo Research) for concentration. The obtained RNA was 
quantified using an Agilent 2100 Bioanalyzer system with an RNA 
Pico kit (Agilent Technologies) and then applied to an Ovation Uni- 
versal RNA-Seq System (NUGEN Technologies) for the construction 
of an RNA-sequence library. At the step for Insert Dependent Adaptor 
Cleavage technology-mediated adaptor cleavage during the library 
construction, specific primers for 16S rRNA and 23S rRNA genes of 
MK-D1 were used to reduce rRNA gene sequences from the cDNA 


pool. The constructed cDNA library was sequenced using the MiSeq 
platform (Illumina). 

The raw RNA sequencing data were trimmed by removal of the adapt- 
ers and low-quality sequences using Trimmomatic v.0.33”. The expres- 
sion abundance of all coding transcripts was estimated in RPKM values 
using EDGE-pro v.1.3.1". 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Genomes for Ca. P. syntrophicum MK-D1, Halodesulfovibrio sp. MK-HDV 
and Methanogenium sp. MK-MG are available under GenBank BioProject 
accession numbers PRJNA557562, PRJNA557563 and PRJNAS557565, 
respectively. The iTAG sequence data was deposited in BioProject 
PRJDB8518 with SRA accession numbers DRR184.081-DRR184101. The 
16S rRNA gene sequences of MK-D1, Halodesulfovibrio sp. MK-HDV, 
Methanogenium sp. MK-MG and clones obtained from primary enrich- 
ment culture were deposited in the DDBJ/EMBL/GenBank database 
under accession numbers LC490619-LC490624. The gene expression 
data of MK-D1 in BioProject PRJDB9032 with the accession number 
DRR199588. The cryo-electron tomograms of Ca. P. syntrophicum 
MK-D1 have been deposited in the EMDB with accession codes EMD- 
0809 and EMD-08532. 
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Extended Data Fig. 1| Growth of MK-D1. a, Effect of temperature on growth of 
MK-D1. Data are mean +s.d. of triplicate determinations. Each data point is 
shownasa dot. The temperature range test was performed twice with similar 
results. b,c, The amino acid concentrations and growth curves of MK-D1in pure 
cocultures at 20 °C. Results from cultures 1 (b) and 2 (c) are shown. Please note 


that the initial concentrations of amino acids were normalized to 100%. Total 
amino acids and several representative amino acids (Val, valine; Leu, leucine; 
Ile, isoleucine) are independently shown for the duplicate culture samples. 
Detailed iTAG-based community compositions of the cultures are shownin 
Supplementary Table 1. 
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Extended Data Fig. 2| Circular representation of MK-D1 genome. From the (40.7%) in the third circle, and GC skew in the fourth circle. The GC content and 
outside to the centre: the distribution of the coding sequences based onthe GCskew were calculated using a sliding window of 2 kb instep of 10 kb. The 
conserved (orange) or non-conserved (grey) genes in the first circle, non- coding sequences and RNA genes illustrate the findings for plus and minus 
strands. 


coding RNAs inthe second circle, GC content showing deviation from average 
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Extended Data Fig. 3| Other representative photomicrographs of MK-D1 
cultures and Methanobacterium sp. strain MO-MB1.a, b, Fluorescence 
images of cells from enrichment cultures after 8 (a) and 11 (b) transfers stained 
with DAPI (violet) and hybridized with nucleotide probes that target MK-D1 
(green) and Bacteria (red). The images are different fields of view to those 
shown in Fig. 1b, c, which were taken at the same time.c, A fluorescence image 
of cellsin the enrichments after 11 transfers hybridized with nucleotide probes 
that target MK-D1 (green) and Archaea (but with one mismatch against MK-D1; 
red). Large and irregular coccoid-shaped cells stained by only ARC915 are 
probably Methanogenium. d, e, Dividing cells of MK-D1 witha bleb. The top- 
right inset image in e shows a magnification of the bleb. f, g, Cryo-EM images of 
MK-D1cells and large membrane vesicles (white arrows). h, i, Ultrathin sections 
of MK-D1 cells witha membrane vesicle. The image i shows a magnified image of 
h.j,k, SEM images of MK-D1 cells with protrusions. I, Ultrathin section ofa 


10 um 


MK-Dicell witha protrusion. m,n, Photomicrographs of pure culture 

of Methanobacterium sp. strain MO-MB1 cells stained with SYBR Green I. Phase- 
contrast (m) and fluorescence (n) images of the same field are shown. a, b, The 
FISH experiments were performed three times with similar results. d,e,j,k, The 
SEM images are representative of n=122 recorded images that were obtained 
from four independent observations from four culture samples. The lipid 
composition experiments were repeated twice and gave similar results. 

f, g, The cryo-EM images are representative of n=14 recorded images that were 
taken from two independent observations from two culture samples. h, i, 1, The 
ultrathin-section images are representative of n=131 recorded images that 
were obtained from six independent observations from six culture samples. 
m,n, The SYBR Green| staining experiment was performed once, but all10 
recorded images showed similar results. Detailed iTAG analyses of cultures are 
shownin Supplementary Table 1. 
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Extended Data Fig. 4 | Ribosomal protein- and 16S rRNA gene-based 
phylogeny of MK-D1.a, Phylogenomic tree of MK-D1and select cultured 
archaea, eukaryotes and bacteria based on 31ribosomal proteins conserved 
across the three domains (Supplementary Table 7). Ribosomal protein 
sequences of MK-D1, the organisms shown in the tree and MAGs of uncultured 
archaeal lineages (Supplementary Table 8) were aligned individually using 
MAFFT. MAG-derived sequences (except for Ca. Korarchaeum) were then 
removed for tree construction. After removing all-gap positions and 
concatenation, the maximum-likelihood tree was constructed using RAXML- 


EU731801, clone GNAO8F02, hypersaline microbial mat 
EU732042, clone GNAO8D11, hypersaline microbial mat 
DQ363819, clone MKCST-ex3, mangrove soil 


JQ327899, clone MD3047-X84, marine sediments 
DQ363810, clone MKCST-C9, mangrove soil 
AY835411, clone 7B08, hydrothermal sediment 
JYIM01000321, Ca. Lokiarchaeum GC14_75 
HQ700686, clone slm_arc_900, marine sediment 
EU732030, clone GNAO8B09, hypersaline microbial mat 
EU731841, clone GN27N5A, hypersaline microbial mat 


- AY835408, clone 4C08, hydrothermal sediment 
“__ AY822005, clone CH1_S2_62, freshwater sediment 
'— AY835409, clone 7HO7, hydrothermal sediment 
; EF687589, clone 104A88, marine sediment 
AB301883, clone plta-vmat-2, submarine hot spring mat Betal 
1 AFO68822, clone VC2.1 Arc31, hydrothermal vent cap 


DSAG12_00672, Ca. Prometheoarchaeum syntrophicum MK-D1 
AB161349, clone ASN17, petroleum contaminated soil 


b Arabidopsis thaliana Columbia 


e>90 Chlamydomonas reinhardtii 
e>75 Phytophthora ramorum Pr102 
o>60 Monosiga brevicollis MX1 
ooo Mus musculus C57BL 
— Aspergillus fumigatus Af293 
0.4 z Leishmania major Friedlin V1 
z Naegleria gruberi NEG-M 
SI Plasmodium falciparum 3D7 


Paramecium tetraurelia d4-2 


Thermosphaera aggregans DSM 11486 
Staphylothermus marinus F1 
Hyperthermus butylicus DSM 5456 

Aeropyrum pernix K1 
Caldisphaera lagunensis DSM 15908 
Ignicoccus islandicus DSM 13165 
Acidianus hospitalis W1 
Metallosphaera yellowstonensis MK1 

Sulfolobus islandicus YN1551 

Ignisphaera aggregans DSM 17230 
z Fervidicoccus fontis Kam940 

Q) Caldivirga maquilingensis \C-167 
* Vulcanisaeta distributa DSM 14429 
{eee ferrireducens 
Thermoproteus uzoniensis 768-20 
"—— Thermofilum uzonense 
Nitrosopumilus maritimus SCM1 
eee viennensis EN76 


if Desulfurococcus kamchatkensis 1221n 


Halobacteria (16) 
Methanonatronarchaeum thermophilum 


Methanomicrobia (7) 


Archaeoglobi (3) 


|__<J thermoplasmata (6) 


Methanobacteria (4) 
Methanopyrus kandleri AV19 


(</iethanococci (2) 


Borrelia burgdorferi ZS7 
(_—_ Campylobacter jejuni NCTC 11168 
Escherichia coli str K-12 
[Becher prowazekii Rp22 
— Bactorides thetaiotaomicron VPI-5482 
Rhodopirellula baltica SH_1 


% 
Bacillus subtilis 168 
— f= Synechococcus elongatus PCC 6301 
Thermotoga maritima 


Gamma 


Ca. Lokiarchaeota 


Aloha 


MEHH01000036, Ca. Heimdallarchaeota archaeon AB_125 Ancient archaeal group [AAG] 
MDVS01000157, Ca. Heimdallarchaeota archaeon LC_3 
MDVT01000007, Ca. Odinarchaeota archaeon LCB_4 


(Ca. Heimdallarchaeota) 


(Ca. Odinarchaeota) 


NG. Bootstrap values around critical branching points are also shown. In total, 
14,875 sites of the alignment were used for tree construction. b, Aribosomal 


Ca. Prometheoarchaeum syntrophicum MK-D1 


Eukarya 


Ca. Lokiarchaeota 
(or DSAG / MBG-B ) 


Crenarchaeota 


Thaumarchaeota 


Euryarchaeota 


Bacteria 


; Deep-Sea Archaeal Group [DSAG] / 
HQ588679, clone AMSMV-25-A17, mud volcano sediment Beta2 Marine Benthic Group-B [MBG-B] / 


Marine Hydrothermal Vent Group [MHVG] 


protein-based phylogenomic tree constructed using MrBayes. Bayesian 


inference phylogenies were calculated using MrBayes 3.2.7a and aribosomal 
protein concatenated alignment used for Fig. 4a.c, Phylogenetic tree of MK-D1 
and related archaea based on 16S rRNA genes. The 16S rRNA gene sequences 
were aligned using SINA against the Silva v.132 alignment and the maximum- 


likelihood tree was calculated using RAXML. 


Article 


Compound 
ALA 

ARG 

ASN 

ASP 

CYS 

GLN 

GLU 

GLY 

HIS 


VAL 

vitamin B12 

biotin 

coenzyme A 

lipoate 
tetrahydromethanopterin 
nicotinamide adenine dinucleotide 
riboflavin 
tetrahydrofolate 
thiamine pyrophosphate 
adp 

atp 

ump 

udp 

utp 

ctp 

gdp 

gtp 

datp 

dctp 

dgtp 

dttp 


Extended Data Fig. 5 | Amino acid, cofactor and nucleotide biosynthesis 
capacities of MK-D1 and other Asgard archaea. Genomes that encode 
proteins for the synthesis of amino acids, cofactors and nucleotides from 
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pyruvate or acetyl-CoA (dark blue) and synthesis from other intermediates 


Hel_GB_B 


Methanogenium sp. 


Halodesulfovibrio sp. 


Hel_GB_A 


(light blue) are indicated. Those without complete pathways from pyruvate 
and/or acetyl-CoA are indicated in white. Halodesulfovibrio sp. strain MK-HDV 
and Methanogenium sp. strain MK-MG isolated in this study are also shown. 
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Extended Data Fig. 6 |Maximum-likelihood tree of Asgard archaea archaea sequences against the UniProt/SwissProt database (2019_05). 
urocanate hydratase. Urocanate hydratase (HutU) homologueswere obtained Sequences were aligned using MAFFT v.7 with default settings and trimmed 
by BLASTp analysis of the Asgard archaea sequences against the UniProt using trimAl v.1.2 with default settings. The maximum-likelihood tree was 
database (release 2019_06). Of homologues with sequence similarity >40% and constructed using RAXML-NG using fixed empirical substitution matrix (LG), 
overlap >70%, representative sequences were selected using CD-HIT witha 4 discrete GAMMA categories, empirical amino acid frequencies from the 
clustering cut-off of 70% similarity (otherwise default settings were used). alignment and 100 bootstrap replicates. In total, 876 sites of the alignment 
Additional homologues with verified biochemical activity, sequencesimilarity | wereusedfortree construction. 

>30% and overlap >70% were obtained by BLASTp analysis of the Asgard 
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a7 AOA2NOFSU4 [Flavobacteriaceae bacterium MAR_2009_75] 
100 AOA223V649 [Maribacter cobaltidurans] 
AOA1H4ABG8 [Bizionia paragorgiae] 
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Extended Data Fig. 7 |Maximum-likelihood tree of Asgard archaea L- 


SRR (Serine racemase) 059791 [Schizosaccharomyces pombe] 


threonine/L-serine dehydratase. a, Tree calculated for target Asgard archaea 
L-threonine/L-serine dehydratase (TdcB) and homologues. TdcB homologues 
were obtained by BLAS Tp analysis of the Asgard archaea sequences against the 
UniProt reference proteome and SwissProt database (release 2019 _06). Of 
homologues with sequence similarity >40%, overlap >70% and predicted 
prosite domain PSO00165 (serine/threonine dehydratases pyridoxal-phosphate 
attachmentsite), representative sequences were selected using CD-HIT witha 
clustering cut-off of 70% similarity (otherwise default settings were used). 
Additional homologues with verified biochemical activity, sequence similarity 
>30% and overlap >70% were obtained by BLASTp analysis of the Asgard 
archaea sequences against the UniProt/SwissProt database (2019 _05). 
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Sequences were aligned using MAFFT v.7 with default settings. Positions with 
gapsin more than 10% of the sequences were excluded from the alignment 
using trimAl v.1.2 (-gt 0.9; and otherwise default settings were used). The 
maximum-likelihood tree was constructed using PhyML using a fixed empirical 
substitution matrix (LG), 4 discrete GAMMA categories, empirical amino acid 
frequencies from the alignment and 100 bootstrap replicates (-b100-daa-m 
LG -ve). Intotal, 308 sites of the alignment were used for tree construction. 

b, Tree calculated for a subset of sequences contained ina section of the 
original tree (branches that are coloured blue). Sequences were realigned and 
trimmed as described for a. In total, 308 sites of the alignment were used for 
tree construction. 


Extended Data Table 1| SSU rRNA gene clones obtained from the primary and six successive transferred enrichment 
cultures 


<Primary enrichment culture> 


Cl . ‘ : fi E/9078 


Identical or almost identical clones 


No. of Sequence Sequence 
Phylotype Accession we “ i au 4 : en detected from the AOM bioreactor 
clone length Closest cultured species or clone (accession number) identity | Phylogenetic affiliation 5 ; 
name no. enrichment (accession number, 
s (bp) (%) ei 
111_U1 40  LC490621 374 Halodesulfovibrio aestuarii strain Sylt3 (NR_116770) 99 genus Halodesulfovibrio - 
111_U2 3 = 377 Methylobacter marinus strain A45 (NR_025132) 100 genus Methylobacter MK903D_B19 (AB831411, 100%) 
111_U3 2 = 374 Photobacterium indicum — strain NBRC 14233 (NR_113657 ) 100 genus Photobacterium MK903D_B9 (AB831402, 100%) 
111_U4 1 LC490622 377 subseafloor sediment clone ODP1251B13.14 (AB177314) 99 subgroup 21 within the phylum Acidobacteria - 
111_U5 1 LC490623 377 hydrothermal seep sediment BAC_OTU_13 (KP091106 ) 100 GIF9 group within the class Dehalococcoidia MKOD_B60 (AB831337, 99.5%) 
111_U6 1 LC490624 374 Roseovarius gaetbuli strain YM-20 (NR_134163) 99 genus Roseovarius - 


Clone i ° , 340F/932R 


Identical or almost identical clones 


No. of Ss Ss 
Phylotype send Accession sianealte . 2 eee 6 tess detected from the AOM bioreactor 
clone length Closest cultured species or clone (accession number) identity Phylogenetic affiliation 7 . 
name no. enrichment (accession bumber, 
s (bp) (%) oe 
sequence identity %) * 
111_A1 6 - 535 Methanococcoides burtonii strain DSM 6242 (NR_074242 ) 99 genus Methanococcoides MK903D_A2 (AB831282, 100%) 
111_A2 5 LC490620 513 Methanogenium cariaci strain JR1 (NR_104730) 99 genus Methanogenium - 
111_A3 2 - 534 methane seep clone AN_5119N_arc_E4_T3 (KM356859) 99 ANME-2a MKOD_A9 (AB831268, 100%) 
111_A4 2 LC490619 516 methane seep clone AC_5120_arc_D2_T3 (KM356804) 99 Lokiarchaeota ( Ca. P. syntrophicum strain MK-D1) _MK903R_A35 (AB831305, 99.0%) 


<Six successive transferred enrichment culture> 


Identical or almost identical clones 


No. of . Ss Ss A 
Phylotype ae Accession eacase ‘ , i ee ‘ ries ig detected from the AOM bioreactor 
clone length Closest cultured species or clone (accession number) identity Phylogenetic affiliation < é 
name no. enrichment (accession bumber, 
s (bp) (%) inka 
sequence identity %) * 
111-5_U1 40 - 374 Halodesulfovibrio oceani strain 1.8.1 (NR_116768) 100 genus Halodesulfovibrio - 
111-5_U2 6 - 380 methane seep clone AC_5120_arc_D2_T3 (KM356804) 100 Lokiarchaeota ( Ca. P. syntrophicum strain MK-D1) — 
111-5_U3 1 - 380 Methanogenium boonei_ strain AK-7 (NR_115706) 99 genus Methanogenium - 


@The clone sequences have been reported in our previous study”. 
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Extended Data Table 2 | Carbon isotope fractionation values in MK-D1 cultures after 120 days incubation with and without 
stable isotope labelled amino acids 


Culture ID 3'3C-CO (% VPDB)? —-3'8C-CH4 (%- VPDB)* 
Co-cultures with Methanobacterium 
No.1 with stable isotope labeled AAs -12.3 4094.8 
No.2 with stable isotope labeled AAs -9.3 6990.7 
No.3 w/o stable isotope labeled AAs -23.1 -36.7 
No.4 w/o stable isotope labeled AAs -23.1 -33.1 


Tri-cultures with Halodesulfovibrio and Methanogenium 


No.5 with stable isotope labeled AAs 318.5 86.0 
No.6 with stable isotope labeled AAs 309.3 87.8 
No.7 w/o stable isotope labeled AAs -22.6 -95.5 
No.8 w/o stable isotope labeled AAs -22.8 -97.8 


*Parts per thousand (%o) compared with the Vienna Pee Dee Belemnite (VPDB) standard. 


Extended Data Table 3 | Growth of MK-D1 after incubation of 120 days with a range of substrates 


7 ve ; 5 epee 
Initial MK-D1 Final MK-D1__ No. of MK-D1 Community compositions evaluated by iTAG analysis (%) 


16S rRNA 16S rRNA 16S rRNA 


Culture name Substrate gene copies genecopies gene copies Methenobacteri 
per ml of per ml of relative to MK-D1 Methanogenium um sp. strain Others 
culture culture initial culture MO-MB1 
Inoculum Casamino acids (CA)° + 20 amino acids mixture (AAs)° + powdered milk (PM)¢ = 5.91E+05 = 39.8 36.8 23.3 0.01 
Control-1 CA +20 AAs + PM 1.42E+04 1.62E+05 11.36 76.7 21.8 1.4 0.03 
Control-2 CA +20 AAs +PM 4.67E+03 6.55E+04 14.03 60.3 38.0 1.6 0.04 
H2-1 CA +20 AAs + PM + 1.5 kPa H2°+ 10 mM 2-bromoethane sulfonate (2-BES)! 9.46E+03 4.35E+03 0.46 - - - - 
H2-2 CA + 20 AAs + PM + 1.5 kPa H2 + 10 mM 2-BES 1.37E+04 3.28E+03 0.24 a = = = 
H2-3 CA + 20 AAs + PM + 1.5 kPa H2 + 10 mM 2-BES 3.10E+04 8.27E+03 0.27 = = = = 
Formate-1 CA + 20 AAs + PM + 1 mM Formate + 10 mM 2-BES 2.76E+04 2.00E+03 0.07 = ist = = 
Formate-2 CA + 20 AAs + PM + 1 mM Formate + 10 mM 2-BES 1.46E+04 9.49E+03 0.65 a ord - =, 
Nitrate-1 CA + 20 AAs + PM +500 pM Nitrate? 2.13E+04 8.43E+03 0.40 - - - - 
Nitrate-2 CA + 20 AAs + PM + 500 uM Nitrate 1.47E+04 5.19E+03 0.35 = = _ = 
Sulfate-1 CA + 20 AAs + PM +500 uM Sulfate 5.28E+03 9.21E+04 17.42 79.5 19.5 1.0 0.03 
Sulfate-2 CA + 20 AAs + PM +500 uM Sulfate 3.39E+04 5.28E +04 1.56 aa = _ - 
Thiosulfate-1 CA + 20 AAs + PM + 500 uM Thiosulfate 1.23E+04 5.00E +04 4.05 - - - - 
Thiosulfate-2 CA +20 AAs +PM +500 uM Thiosulfate 2.29E+04 6.09E+04 2.66 aol = = = 
Lactate-1 CA + 20 AAs + PM + 1 mM Lactate 5.31E+03 1.31E+04 2.46 - - - - 
Lactate-2 CA +20 AAs + PM +1 mM Lactate 1.53E+04 1.91E+04 1.25 = = = = 
Acetate-1 CA +20 AAs + PM +1 mM Acetate 2.63E+04 9.17E+04 3.48 - - - - 
Acetate-2 CA + 20 AAs + PM +1 mM Acetate 1.56E+04 2.13E+04 1.36 - = - = 
Glucose-1 CA +20 AAs + PM +1 mM Glucose 1.12E+04 1.16E+05 10.33 73.8 24.3 1.9 0.03 
Glucose-2 CA +20 AAs + PM +1 mM Glucose 1.06E+04 1.06E+05 10.01 70.3 28.0 17 Not detected 
Fructose-1 CA + 20 AAs + PM +1 mM Fructose 3.18E+04 3.31E+04 1.04 - - - - 
Fructose-2 CA + 20 AAs + PM +1 mM Fructose 1.79E+04 1.44E+05 8.08 = = = = 
Xylose-1 CA +20 AAs + PM +1 mM Xylose 2.82E +04 6.79E+03 0.24 - - - - 
Xylose-2 CA + 20 AAs +PM +1 mM Xylose 9.25E+03 1.18E+05 12.73 61.4 36.5 21 0.01 
Ribose-1 CA + 20 AAs + PM +1 mM Ribose 1.42E+04 2.88E +04 2.02 = - - - 
Ribose-2 CA + 20 AAs + PM + 1 mM Ribose 7.34E+03 2.29E+04 3.13 - = =, = 
Maltose-1 CA +20 AAs + PM +1 mM Maltose 2.84E+04 1.21E+05 4.25 - = = - 
Maltose-2 CA + 20 AAs + PM +1 mM Maltose 2.17E+04 4.55E+04 2.09 = = = =- 
Citrate-1 CA + 20 AAs + PM +1 mM Citrate 3.36E +04 1.20E+05 3.56 = = - = 
Citrate-2 CA +20 AAs + PM +1 mM Citrate 1.82E+04 5.73E+04 3.15 a = = = 
Pyruvate-1 CA + 20 AAs + PM + 1 mM Pyruvate 1.73E+04 9.37E+04 5.42 = = = = 
Pyruvate-2 CA + 20 AAs + PM+1 mM Pyruvate 2.22E+04 4.86E+03 0.22 = = = = 
Fumarate-1 CA + 20 AAs + PM +1 mM Fumarate 3.16E+04 7.20E+04 2.28 _ = = = 
Fumarate-2 CA + 20 AAs + PM +1 mM Fumarate 1.94E+04 2.35E+04 1.21 = = = = 
Archaeal cell-1 CA +20 AAs + PM + archaeal cell membrane components" 1.53E+04 1.42E+05 9.27 81.5 17.5 0.8 0.3 
Archaeal cell-2_ CA + 20 AAs + PM + archaeal cell membrane components 4.17E+04 1.05E+05 2.52 - - - - 


A dash indicates that data were not taken for that sample. *The iTAG analysis was performed for samples in which an increase of about 10 times or more in 16S rRNA gene copy numbers of MK-D1 
was observed after incubation; data were analysed by qPCR assay. Detailed results are shown in Supplementary Table 1. °Final concentration of casamino acids was 0.05% (w/v). ‘Final concen- 
tration of each amino acid was 0.1 mM. “Powdered milk for baby (Hohoemi, Meiji) was used at a final concentration of 0.1% (w/v). °The concentration of hydrogen gas was in the head space of 
the culture bottle. '2-BES was added to inhibit methanogens. °Addition of nitrate completely suppressed the growth of MK-D1. This is probably because nitrate inhibits formate dehydrogenase 
activity of MK-D1%°. "Archaeal cell membrane components were a mixture of phytol, intact polar lipid-glycerol-dialkyl-glycerol tetraethers and core lipid- glycerol-dialkyl-glycerol tetraethers 
(each at a final concentration 50 ng ml"). We used the archaeal membrane components as these have a positive effect on the growth of some archaeal species: (i) archaeal cell extract including 
membrane lipids stimulates the growth of the extremely thermophilic archaeon Thermocaldium modestius®, and (ii) the hyperthermophilic archaeon Thermofilum pendes requires the polar 
lipids for growth, which was obtained from the archaeal species Thermoproteus tenax®”. 
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Changes in behaviour resulting from environmental influences, development and 
learning’ * are commonly quantified on the basis of a few hand-picked features” *°” 


(for example, the average pitch of acoustic vocalizations’), assuming discrete classes 
of behaviours (suchas distinct vocal syllables)”**°. However, such methods 
generalize poorly across different behaviours and model systems and may miss 
important components of change. Here we present a more-general account of 
behavioural change that is based on nearest-neighbour statistics” °, and apply it to 
song development ina songbird, the zebra finch’. First, we introduce the concept of 
‘repertoire dating’, whereby each rendition of a behaviour (for example, each 
vocalization) is assigned a repertoire time, reflecting when similar renditions were 
typical in the behavioural repertoire. Repertoire time isolates the components of 
vocal variability that are congruent with long-term changes due to vocal learning 
and development, and stratifies the behavioural repertoire into ‘regressions’, 
‘anticipations’ and ‘typical renditions’. Second, we obtain a holistic, yet low- 
dimensional, description of vocal change in terms of a stratified ‘behavioural 
trajectory’, revealing numerous previously unrecognized components of behavioural 
change on fast and slow timescales, as well as distinct patterns of overnight 


consolidation’**"*5 


across the behavioral repertoire. We find that diurnal changes in 


regressions undergo only weak consolidation, whereas anticipations and typical 
renditions consolidate fully. Because of its generality, our nonparametric description 
of how behaviour evolves relative to itself—rather than to a potentially arbitrary, 
experimenter-defined goal***"°—appears well suited for comparing learning and 


change across behaviours and species 


1718’ as well as biological and artificial systems>. 


Zebra finches acquire complex, stereotyped vocalizations through a 
months-long process of sensory-motor learning®”’ *. During devel- 
opment, syllable order—that is, syntax—and the spectral structure of 
syllables evolve’. These two aspects of vocal learning may be medi- 
ated by largely independent mechanisms with distinct anatomical 
substrates?" ””, Here we focus on characterizing the development of 
spectral structure. We began our studies by obtaining dense audio 
recordings of five male zebra finches between 35 and 123 days post- 
hatch (dph; mean + standard deviation 73.4 + 18.6 consecutive days 
of recording). Birds were isolated from other males after birth and, 
on average, live-tutored from around 46 to 63 dph (Extended Data 
Fig. 1a). Band-passed (0.35-8 kHz) audio recordings were segmented 
into individual vocal renditions, and represented as song spectrogram 
segments (Fig. 1a; 563,124-1,203,647 renditions per bird). We excluded 
noise and isolated calls from the analyses. 


Behavioural change in single features 

Vocal development is often characterized by considering changes in 
acoustic features such as pitch, frequency modulation? or entropy 
variance’"* (Fig. 1b). Such characterizations readily reveal multiple 


timescales of behavioural change: individual features can vary consist- 
ently within a day, display overnight discontinuities, and show drift 
over the duration of weeks or months (Fig. 1b, c). 

We summarize the relation between change at these different 
timescales through a consolidation index (Fig. Ic), which measures 
whether within-day change in a feature (‘span’, Fig. 1c) is maintained 
or lost overnight (‘shift’, Fig. 1c). Weak consolidation’”* corresponds 
to a consolidation index of close to —-1 (no consolidation: the shift is 
equal but opposite to span); strong consolidation*® corresponds to 
an index of close to 0 (perfect consolidation: the shift is O days); and 
offline learning*”? to an index of larger than O. Across 32 commonly 
used acoustic features, the consolidation indices in our data are mostly 
negative, indicating weak consolidation (Fig. 1d, top; median —-0.67). 
This finding holds even for random spectral features (Fig. 1d, bottom; 
median —0.64) and is consistent with past accounts of song develop- 
ment in zebra finches”™. 

Individual features, however, may provide an incomplete account of 
change ina complex behaviour suchas song vocalizations. To illustrate 
this point, we consider three simple scenarios. In the first two (Fig. le, f), 
the change in behaviour that occurs within any given day largely mir- 
rors, ona faster timescale, the slow change that occurs over the course 
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Fig. 1| Fast and slowchange in developing zebra finch vocalizations. 

a, Vocalizations at three developmental stages. Dotted lines indicate syllable 
onsets. Crystalized song syllables (middle and bottom) fall into discrete 
categories (syllables i, a,b,c) and forma stereotyped ‘motif’, typically 
resembling the tutor song. b, Time course of one acoustic feature, entropy 
variance, for syllable b. c, Magnification of the region outlined in b, showing a 
period of within-day span (early to late, day k) and overnight shift (late day kto 
early day k +1). The consolidation index (Cl) is approximately —0.75. 

d, Histograms of consolidation indices over pairs of consecutive days, syllables 


of many days or weeks. In the third scenario (Fig. 1g), within-day change 
is partly ‘misaligned’ with slow change: thatis, it involves behavioural 
features that do not consistently change on slower timescales. Within- 
day change could reflect metabolic, neural or other changes that are 
not necessarily congruent with longer-term learning or development; 
the slow change reflects long-term modifications in behaviour that 
are typically equated with learning and development. We abstractly 
refer to these slow components as the direction of slow change (DiSC). 
Notably, simulations of these scenarios show that negative consoli- 
dation indices for single features can result from very different time 
courses of development (Fig. 1h, i). Negative indices occur both when 
within-day and slow changes are closely aligned but daily gains along 
the DiSC are mostly lost overnight (weak consolidation, Fig. 1f), and 
when diurnal gains along the DiSC are perfectly consolidated but within- 
day change is substantially misaligned with slow change (Fig. 1g). The 
broad distributions of indices observed during song development 
(Fig. 1d, top), which also include strongly positive indices, seem more 
consistent with the misaligned scenario (Fig. li, histogram 3). 


Nearest-neighbour measures of change 


We developed a general characterization of change in high-dimensional 
behavioural data, based on nearest-neighbour statistics”, that can 
distinguish between the scenarios in Fig. le—-g. We initially analyse 
song-spectrogram segments of fixed duration aligned to syllable onset 
(Fig. 1a), but later extend our analysis to alternative parameterizations 
of the vocalization behaviour. Vocal renditions are represented as real- 
valued vectors x; € R@(whereiindexes renditions, and d denotes dimen- 
sion), each associated with a production time, ¢, < R (for example, the 
bird’s age when singing x,). The K-neighbourhood of renditionx;is given 
by those K renditions (among the set of all renditions) that are closest 
to.x;on the basis of some metric (for example, Euclidean distance). For 
small-enough values of K, different syllable types do not mix within a 
neighbourhood (Extended Data Fig. le) and neighbourhood statistics 
are largely independent of cluster boundaries, obviating the need for 
clustering renditions into syllables. 


and birds, for 32 acoustic features (top) and 32 random spectral projections 
(bottom). e-g, Three scenarios of slow developmental change (grey arrows) 
and fast within-day change in vocalizations. Each point represents the 
distribution of vocalizations froma given time and day. A larger distance 
between points indicates more dissimilar distributions. h, Linear projections 
of the points in g onto two example song features (dotted lines ing) for the 
misaligned, strong-consolidation scenario. Consolidation strength varies 
across directions. i, Consolidation indices over 10,000 random projections 
simulated from the three scenarios (1,2 and3 ine-g). 


We visualize all vocalizations produced by a bird throughout develop- 
ment with Barnes-Hut ¢-distributed stochastic neighbour embedding 
(t-SNE)" (which predominantly preserves local neighbourhoods"). 
Each point in the embedding corresponds toa spectrogram segment, 
x; (Fig. 1a). Different locations correspond to different vocalization 
types (Fig. 2b and Extended Data Fig. 2a). The embedding suggests 
that vocalizations change from undifferentiated subsong’*”° (Fig. 2a, 
middle) to clearly differentiated syllables that fall into at least four 
categories (Fig. 2a, syllables a, b, cand introductory notei, as in Fig. 1a). 
The emergence of clustered syllables from unclustered subsong can be 
confirmed by standard clustering approaches (Fig. 2g and Extended 
Data Fig. Ic, d). Notably, the embedding does not preserve all local 
structure in the data, as nearest neighbours in the embedding space are 
not necessarily nearest neighbours in the high-dimensional data space 
(Fig. 2a; black crosses represent high-dimensional neighbours). We 
therefore quantify behavioural change directly in the high-dimensional 
data by analysing the composition of high-dimensional neighbour- 
hoods”? (Extended Data Fig. 2e-g). 

For each data point, we refer to the production times of all data 
points in its K-neighbourhood as ‘neighbourhood production times’ 
(or ‘neighbourhood times’; Fig. 2a, histogram). We summarize the 
neighbourhood times of many data points (Fig. 2d) through ‘pooled 
neighbourhood times’ (Fig. 2c) and the ‘neighbourhood mixing matrix’ 
(Fig. 2e and Extended Data Figs. 2g, 3d). Each value in the neighbour- 
hood mixing matrix represents the similarity between behaviours from 
two production periods. Deviations from zero indicate that behaviours 
fromthe corresponding production periods are more similar (for values 
greater than O), interms of mixing at the level of K-neighbourhoods, or 
less similar (for values smaller than O) than expected from a shuffling 
null hypothesis. 

We use multidimensional scaling” on the mixing matrix to represent 
the similarity between behaviours from different production times as 
a ‘behavioural trajectory’ (Fig. 2h). Each point on the trajectory rep- 
resents the distribution of all vocalizations produced ona given day. 
Pairwise distances between points represent the dissimilarity between 
distributions (Extended Data Fig. 2e-g). Here we focus on a 16-day 
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Fig. 2 | Neighbourhood mixing and repertoire dating. a, t-SNE of all 
vocalizations from the bird in Fig. 1a. Each point is asyllable rendition. Clusters 
(syllables i, a, b,c) emerge during development. Arrows indicate renditions of 
syllable b from Fig. 1a. Crosses show the 600 nearest neighbours of the 
rendition from day 58. Inset, histogram of production times (neighbourhood 
times) over the 600 nearest neighbours. b, Average spectrograms for different 
locations in the ¢-SNE visualization froma. c, Pooled neighbourhood times for 
day 70. Percentiles (vertical lines) quantify the extent of the behavioural 
repertoire on day 70. d, Percentiles (Sth, 50th and 95th) of neighbourhood 
times for individual renditions from day 70 (each row represents a rendition). 
Rows are sorted by the 50th percentile—the repertoire time (rT, red dots). Left 


phase of gradual change midway through development (Fig. 2f). During 
this phase, the behavioural trajectory is structured differently on fast 
and slow timescales (Extended Data Fig. 3f-h). The two-dimensional 
projection of the trajectory that explains the maximal variance mainly 
reflects the direction of slow change (Fig. le-g, 2h). 

The behavioural trajectory summarizes the progressive differentia- 
tion of vocalizations into distinct syllables, as well as simultaneous, 
continuous change in many spectral features of individual syllables. 
Notably, change is characterized through the behavioural trajectory 
by comparing the bird’s song to itself across time, rather than to atutor 
song. Thus the behavioural trajectory may also reflect innate song pri- 
ors that can result in crystallized song deviating from the tutor song”® 
and additional change due to other developmental processes”. 


Repertoire extent and consolidation 

Additional t-SNE visualizations of the data suggest that renditions from 
nearby days overlap considerably, whereby changes occurring withina 
day partly mimic the slow change across days (Extended Data Fig. 2b, c). 
We quantify this apparent spread along the DiSC—reflecting different 
degrees of behavioural ‘maturity’—through neighbourhood times 
(Fig. 2d). We refer to behavioural renditions that predominantly have 
neighbours produced in the future as ‘anticipations’, and to renditions 
that predominantly have neighbours that were produced in the past 
as ‘regressions’ (Extended Data Fig. 3b). By contrast, renditions that 
are ‘typical’ for a given developmental stage mostly have neighbours 
produced on the same or nearby days. We denote the median neigh- 
bourhood time as the ‘repertoire time’ of a rendition. The repertoire 
time effectively places each rendition along the DiSC (Fig. 2d, x axis): 
thatis, it dates it with respect to the progression of vocal development 
(‘repertoire dating’). A broad distribution of repertoire times across 
all renditions ina day (Fig. 2d) suggests considerable behavioural vari- 
ability along the DiSC; the most extreme regressions are backdated 
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and right black dots mark the 5th and 95th percentiles. A small random 
horizontal shift was added to each dot for visualization. e, Mixing matrix for all 
data points depicted ina. Each column of the matrix represents a histogram of 
production times, pooled over all neighbourhoods of points within a day 
(xaxis), normalized by a shuffling null hypothesis (LMR, base-2 logarithm of 
the mixing ratio). The black arrow marks the first day of tutoring. f, Average 
mixing matrix for five birds (days 60-75). g, Single-day t-SNE, for three days 
(for the same bird as in Fig. 1a), illustrating the gradual emergence of clusters. 
h, Behavioural trajectory based onf, computed with ten-dimensional 
multidimensional scaling (MDS). Each point corresponds toa day. The two 
dimensions that capture the most variance in the trajectory are shown. 


more than ten days into the past, and the most extreme anticipations 
are post-dated more than ten days into the future. 

To quantify behavioural change on the timescale of hours, we sub- 
divide each day into ten consecutive periods, and compute pooled 
neighbourhood times separately for each period. The percentiles of the 
pooled neighbourhood times chart the evolution of behaviour within 
and across days throughout development (Fig. 3a). Each repertoire- 
dating percentile is akin to alearning curve for a part of the behavioural 
repertoire (for example, typical renditions are described by the 50th 
percentile, and extreme anticipations by the 95th). The evolution of 
each percentile captures the progress along the DiSC (Fig. 3a, y axis) 
over time (Fig. 3a, x axis). We validated this characterization of behav- 
ioural change on simulated behaviour that mimicked vocal develop- 
ment (Extended Data Fig. 4a—d). 

The repertoire-dating percentiles reveal that typical renditions move 
gradually along the DiSC throughout the day, and that changes along 
the DiSC acquired during the day are, on average, fully consolidated 
overnight (Fig. 3a, b, red). Anticipations undergo a similar or smaller 
degree of within-day change (Fig. 3a, b, 75th and 95th percentiles), 
whereas regressions move by a larger distance within each day, but 
this change is only weakly consolidated overnight (Fig. 3a, b, Sth and 
25th percentiles; Fig. 3e). The most ‘immature’ renditions thus improve 
markedly throughout a day—more than typical renditions or anticipa- 
tions—but these improvements are mostly lost overnight. This pattern 
of change seems to be characteristic of development, as it is absent in 
adults (Extended Data Figs. 5, 6). 

Movement along the DiSC also occurs on timescales that are faster 
than hours, namely within bouts of singing—that is, groups of vocaliza- 
tions that are preceded and followed by a pause (average bout duration 
3.81+ 0.83 s across birds). We subdivide each bout into ten consecutive 
periods, compute pooled neighbourhood times for each period (over 
all bouts in a day), and track change through the corresponding per- 
centiles (Fig. 3c, d). Within bouts, large changes along the DiSC occur at 
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Fig. 3 | Multiple components of behavioural change during sensory-motor 
learning. a, Average repertoire dating percentiles (for five birds) describing 
within and across-day changes along the DiSC. For each production day and 
period, five percentiles of the pooled neighbourhood times (Fig. 2c) are 
arranged vertically (lines). b, Average of data fromaacross days 60-70, 
expressed relative to the average 50th percentile. c, Within-bout changes. As 
fora, but based on production day and period ina singing bout. d, As for b, but 
averaged across data fromc.e, Spanand shift for the 5th, SOth and 95th 
percentiles (blue arrows in b, analogous to Fig. 1c) averaged over days 50-80, 
separately for syllables (points) and birds (colours). Black lines indicate 
medians and 95% bootstrapped confidence intervals over all points. 


the regressive tail of the behavioural repertoire: vocalizations are most 
regressive at the onset and offset of bouts (Fig. 3c, d, 5th percentile). 
Similar, albeit weaker changes occur for typical renditions (Fig. 3c, d, 
red). The same apparent changes in song maturity are observed when 
short and long bouts (durations 2.30 + 0.54 s versus 6.28 + 1.73 s) are 
considered separately. Song maturity thus decreases at the end of a 
bout, not after a fixed time into the bout (Extended Data Fig. 5a-—c). 


Misaligned behavioural components 


The repertoire time reveals within-day and within-bout changes that 
mirror, ona faster timescale, changes that also occur over many days 
(see Supplementary Methods). As above (Fig. 1), we refer to such com- 
ponents of change as being aligned with the DiSC, and to components 
that are not reflected in the repertoire time as being misaligned. 

We identify both aligned and misaligned components of change 
through the ‘stratified mixing matrix’, which combines a neighbour- 
hood-mixing matrix (for example, Fig. 2f) with repertoire dating. Each 
day’s behavioural repertoire is binned into five consecutive production 
periods. Within each period, the behavioural repertoire is subdivided 
into five strata on the basis of repertoire time (Fig. 2d, quintiles). All 
renditions froma day thus fall into 5 x 5=25 bins. The stratified mixing 
matrix measures similarity between 50 bins that combine the data from 
two adjacent days (Fig. 3g). We compare the measured stratified mix- 
ing matrix with simulations that differ with respect to how within-day 
change and change across adjacent days align with the DiSC (Fig. 3f 
and Extended Data Fig. 4e-j). In model 1, development is one-dimen- 
sional and therefore aligned with the DiSC (Fig. 3f, top; similar to Fig. le). 
In model 2, within-day change involves acomponent that is not aligned 
with the DiSC (Fig. 3f, middle; similar to Fig. 1g). In model 3, adjacent 
days are separated not only along the DiSC, but also along a direction 


consolidation consolidation 


f, Simulated stratified mixing matrices (right) for three models (left) of the 
alignment of within-day and across-day change with the DiSC. g, Average 
measured stratified mixing matrices (five birds, days 60-70). h-j, Stratified 
behavioural trajectory based ong. Different two-dimensional projections 
reveal the DiSC (h), as well as within-day (i) and across-day (j) change not 
aligned with the DiSC (labels 1-5 represent different strata). The full ten- 
dimensional trajectories faithfully reproduce the structure of the stratified 
mixing matrices (MDS stress = 0.016); the depicted four-dimensional subspace 
captures 81% of the ten-dimensional variance. k, Separate projections for each 
stratum onto the local DiSC (black arrows in upper diagrams; points represent 
strata fromh). 


orthogonal to both the DiSC and the direction of within-day change 
(Fig. 3f, bottom, across-day change). Prominent ‘stripes’ along every 
other diagonal inthe measured mixing matrix (Fig. 3g) indicate a larger 
similarity between renditions from the same day than between rendi- 
tions from adjacent days, as predicted by model 3, suggesting that 
several misaligned components contribute to change at fast timescales. 

From the stratified mixing matrix, we infer stratified behavioural 
trajectories. The two-dimensional projection that captures most of 
the variance due to strata (Fig. 3h) resembles Fig. 2h and reflects the 
DiSC. Consistent with repertoire dating, behavioural change along the 
DiSC between adjacent days (Fig. 3h, blue versus red for each stratum) 
is small compared with the spread of the behaviour for one day along 
the DiSC (for example, blue points, strata1-5). For each stratum, how- 
ever, much of the change that occurs within a day is misaligned with 
the DiSC (Fig. 3i, k; early versus late separated along the orthogonal 
dimension of within-day change). Yet another misaligned component 
is necessary to appropriately capture change across adjacent days 
(Fig. 3j). These properties of aligned and misaligned components are 
replicated by alinear analysis based on spectral features that are chosen 
to capture change at specific timescales (Extended Data Figs. 7, 8), and 
are robust to how song is parameterized and segmented, and to how 
nearest neighbours are defined (Extended Data Figs. 9, 10). 


Discussion 

Our analysis of high-dimensional vocalizations reveals that vocal learn- 
ing and development do not reflect an underlying one-dimensional 
process. Single behavioural features in isolation therefore provide an 
incomplete account of behavioural change during development and 
learning. The weak consolidation observed here (Fig. 1d) and else- 
where?” at the level of single features appears to reflect prominent 
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misaligned components of within-day change, rather than weak con- 
solidation along the DiSC (Fig. 1h, i). Strong overnight consolidation 
along the DiSC across much of the behavioural repertoire (Fig. 3a, b) 
seems consistent with consolidation patterns observed for skilled 
motor learning in humans”?”*”8 and of motor adaptation in humans!® 
and birds’. 

Our characterization of behaviour onthe basis of nearest-neighbour 
statistics can be applied when no accurate parametric model of the 
behaviour is known, as is the case at present for most natural, complex 
behaviours. The approach is largely complementary to methods that 
rely onclustering behaviour into distinct categories”*?”°””. Forgoing an 
explicit clustering of the data can be advantageous, because assuming 
the existence of clusters can be an unwarranted approximation* and 
may impede the characterization of behaviour that appears not to be 
clustered (such as juvenile zebra finch song; Extended Data Fig. 1); 
moreover, determining correct cluster boundaries is in general an 
ill-defined problem*®. Notably, our analyses require only an indicator 
function that selects nearest neighbours (based here ona ‘locally mean- 
ingful’ distance metric)—a much weaker requirement than a globally 
valid distance metric or the existence of a low-dimensional feature 
space that globally maps behavioural space”. These properties make 
repertoire dating applicable to almost any behaviour and other high- 
dimensional datasets, including data that are characterized by ‘labels’ 
other than production time. Repertoire dating may thus provide a gen- 
eral account of learning and change that is amenable to comparisons 
between different behaviours and model systems, including different 
species” and artificial systems>. 
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Extended Data Fig. 1| Clustering of juvenile and adult zebra finch song. 
a, Vocal development in male zebra finches. Tutoring by an adult male started 
at around day 46 (post-hatch) and lasted 10-20 days. b, Time course of the 
acoustic feature frequency modulation (FM), for syllable b in the example bird 
(compare with Fig. 1b). c, Normalized mean silhouette values for 2-10 clusters 
for vocalizations from the seven days shown ind. High values indicate evidence 
for the respective cluster count. Normalized mean silhouette coefficients are 
based on 20 repetitions of k-means clustering of random subsets of 1,000 68- 
ms onset-aligned spectrogram segments froma single day (asin d), projected 
onto the first five principal components. d, ¢-SNE visualizations of 
vocalizations produced ona given day post-hatch for the example bird (bird 4, 
the same bird as in Fig. la—c). Aseparate embedding was computed for each 
day, and the embedding’s initial condition was based on the previous day. Note 


production time (days) 


the gradual emergence of clusters, each corresponding toa distinct syllable 
type (for example, syllablesi, a, b, cin Fig. 2a).e, Average fraction of neighbours 
froma different cluster, as a function of neighbourhood size. These data are 
analogous to those fromc, d but for vocalizations from day 90 (12,854 data 
points), when clusters are fully developed. For a wide range of neighbourhood 
sizes, the neighbours of a data point mostly belong to the same cluster or 
syllable type. For aneighbourhood size of 100, the average fraction of out- 
of-cluster neighbours from the same day is 0.0089. Thus, for an appropriately 
chosen neighbourhood size, nearest-neighbour methods respect clustering 
structure in the data by construction, and sidestep having to explicitly identify 
clusters in the data. In most analyses, we computed nearest neighbours for 
data from all days, meaning that clustering structure is respected even for 
neighbourhood sizes that are slightly larger than those suggested ine. 
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Extended Data Fig. 2| Properties of large-scale embeddings. a, Three 
auditory features computed on renditions of syllable b. This panel uses the 
same embedding as in Fig. 2a, but with different colours. b, Across-day change 
in vocalizations. This is a magnified cutout from the bottom left region of the 
dashed outline of Fig. 2a. The colours differ from Fig. 2a, and points from days 
50-56 only are shown. c, Within-day change in vocalizations. Points from bare 
shown separately for three individual days and coloured according to 
production time within the day (early to late). Vocalizations change within a 
day: early vocalizations (dark green) are more similar to vocalizations from 
previous days (dark green points in b); late vocalizations (light blue) are more 
similar to vocalizations from future days (light blue in b). d, SNE visualizations 
for dense recordings from three birds (analogous to Fig. 2a). e-g, Illustration of 
a fictitious behaviour that undergoes distinct phases of abrupt change, no 
change and gradual change, and the identification of these phases on the basis 
of nearest-neighbour graphs. e, A low-dimensional representation of the 
behaviour. Each point corresponds toa behavioural rendition (for example, a 
syllable rendition) and is coloured according to production time. Similar 
renditions (for example, syllable renditions with similar spectrograms) appear 
near each other in this representation. The dotted ellipses mark three subsets 
of points corresponding to: (1) a phase of abrupt change; (2) a phase of no 
change; and (3) a phase of gradual change. f, Nearest-neighbour graphs for the 
three subsets of pointsine. Points are replotted frome with different symbols, 


indicating whether their production times fall within the first half (squares) or 
second half (crosses) of the corresponding subset. Edges connect each point to 
its five nearest neighbours. The edge colour marks neighbouring pairs of 
points falling into the same (black) or different (red) halves. Relative counts of 
within- and across-half edges differ according to the nature of the underlying 
behavioural change (histograms of edge counts). Ifan abrupt change in 
behaviour occurs between the first and second half, nearest neighbours of 
points in one half will all be points from the same half, and none from the other 
half (discontinuity). When behaviour is stationary, the neighbourhoods are 
maximally mixed: that is, every point has about an equal number of neighbours 
from the two halves. Phases of gradual change result in intermediate levels of 
mixing. g, Mixing matrix for the simulated data ine, analogous to Fig. 2e. Each 
locationin the matrix corresponds toa pair of production times. Strong mixing 
(white) indicates a large number of nearest-neighbour edges across the two 
corresponding production times (as inf; stationary) and thus similar behaviour 
at the two times. Weak mixing (black) indicates a small number of such edges 
(as inf; discontinuity), and thus dissimilar behaviour. Note that such statistics 
onthe composition of local neighbourhoods can be computed for any kind of 
behaviour and are invariant with respect to transformations of the data that 
preserve nearest neighbours, suchas scaling, translation and rotation. These 
properties make nearest-neighbour approaches highly general. 
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Extended Data Fig. 3 | Repertoire dating and the direction of slowchange.a, 
The four nearest neighbours for example vocalizations (bird 4, syllable b, from 
Fig. 1a). Production times of nearest neighbours (numbers) need not equal that 
of the corresponding example rendition. b, Neighbourhood production times 
for three renditions from day 70 (analogous to Fig. 2a, inset). Rendition 2is 
‘typical’ for day 70 (most neighbours lie in the same or adjacent days); 
renditions 1and 3 area ‘regression’ and an ‘anticipation’ (with neighbours 
predominantly produced inthe past or future, respectively).c, All renditions 
from day 70 (asubset of the points in Fig. 2a). Colours correspond to repertoire 
time (SOth percentile in Fig. 2d). Anticipations (repertoire times greater than 
70) and regressions (repertoire times less than 70) occur at locations 
corresponding to vocalizations typical of later and earlier development 
(compare with Fig. 2a). Numbers 1-3 mark the approximate locations of the 
example renditions in b. d, Mixing matrices for additional birds (analogous to 
Fig. 2e, using the same birds as in Extended Data Fig. 2d). Bird 3 produced onlya 
very few vocalizations (mostly calls) before tutoring onset (black arrows). The 
mixing matrices consistently show a period of gradual change starting after 
tutor onset and lasting several weeks. This gradual change typically slows down 
(resulting in larger mixing values far from the diagonal) at the end of the 
developmental period considered here (day 90 post-hatch; later periods are in 
Extended Data Fig. 6). Grey values correspond to the base-2 logarithm of the 
mixing ratio (LMR), that is, histograms over the pooled neighbourhood times 
(Fig. 2c) normalized by a null hypothesis obtained froma random distribution 
of production times (see Supplementary Methods). For example, an LMR value 
of Simplies that renditions from the corresponding pair of production times 
are 2°=32 times more mixed at the level of local neighbourhoods than would be 
expected by chance (thatis, there is arandom distribution of productiontimes 
across renditions). e, Asind, bird 2, but after shuffling production times among 
all data points. Effects under this null hypothesis are small (the maximal 
observed mixing ratio is 2° or approximately 1.042). Similar, small effects 
under the null hypothesis are obtained for the other mixing matrices discussed 
throughout the text. f-h, Properties of the behavioural trajectory inferred 
from the mixing matrix in Fig. 2f. f, Pairwise distances between points along the 
inferred behavioural trajectory (x axis), plotted against measured disparities 


(y axis). Disparities are obtained by rescaling and inverting the similarities in 
Fig. 2f (see Supplementary Methods). The points onthe trajectory are inferred 
with ten-dimensional non-metric MDS onthe measured disparities. 
Importantly, the pairwise distances between inferred points faithfully 
represent the corresponding, measured disparities (all points lie close tothe 
diagonal; MDS stress = 0.0002). g, h, Structure of low-dimensional projections 
of the behavioural trajectory. We applied principle-component analysis to the 
ten-dimensional arrangement of points inferred with MDS and retained an 
increasing number of dimensions (number of dimensions indicated by 
greyscale). For example, the projection onto the first two principle 
components is shown in Fig. 2h (MDS dimension 2ing, h). The first two 
principle components explain 75% of the variance in the full ten-dimensional 
trajectory. g, Measured (true) disparity (thick grey curve) and distances along 
theinferred trajectories (points and thin curves) as a function of the day gap (6) 
between points. For any choice of projection dimensionality and 6, we 
computed the Euclidean distances between any two points separated by 6and 
averaged across pairs of points. The measured (true) disparities increase 
rapidly between subsequent and nearby days, but only slowly between far apart 
days (thick grey curve). Low-dimensional projections of the trajectory (for 
example, MDS dimension 2) underestimate the initial increase in disparities. 
h, Angle between the reconstructed direction of across-day change for inferred 
behavioural trajectories, as a function of the day gap between points. Same 
conventions and legend as ing. For the one- and two-dimensional trajectories, 
the direction of across-day change varies little or not at all from day to day (see 
inset; the arrowindicates the angle of across-day change). On the other hand, 
the direction of across-day change along the full, ten-dimensional behavioural 
trajectory is almost orthogonal for subsequent days. Datashowning,h 
suggest that the full behavioural trajectory is more ‘rugged’ than indicated by 
the two-dimensional projection in Fig. 2h. This structure is consistent with the 
finding that across-day change includes a large component that is orthogonal 
tothe directions of slow change and of within-day change (Fig. 3j). Note thata 
shows 200-ms spectrogram segments, whereas b-hare based on 68-ms 
segments (as are most of the analyses). 
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Extended Data Fig. 4| See next page for caption. 


(oy 


cS) 
fa) 


ei 
lee Z| g Model 2 - strong consolidation Ea 10 g 
52 Ss 
vo 5 vu 
FA E [ones 
= C j = 
mo} 4) Be] 
i, Z, 0 8 La 0 8 
5 /| 3 
Z $2 
/ 5D 2 
uv uv 
<4 By i— 
i 4 V4 ie 
-10 “15 
zg ne: 
production production 
time (days) time (days) 
> > de 
3 3 across-day 
f z 2 g 3S change 
E§ gs Big 
= s ss 
= eo 
= = 


dayk | dayke? 
©0090 e0e000 


early late 


\ Gf 
so 


5 late : day 4 
early in day 


fis ds da 


low ZZ] high 


similarity 
4 


2 3 4 


day k+1 


7H 


1 


idl 


day k 


Extended Data Fig. 4 | Models of the alignment between components of 
change. a-d, Validation of repertoire dating. We simulated individual 
behavioural renditions as points ina high-dimensional space, drawn froma 
time-dependent probability distribution that changes both within and across 
days (see Supplementary Methods), and verified that repertoire dating can 
successfully recover the underlying structure of the models. The main 
parameters determining the relative alignment of the DiSC with the directions 
of within-day and across-day change are: k,, the amount of within-day change 
along the DiSC; g,,, the amount of within-day change orthogonal to the DiSC; 
and b,,, the amount of across-day change orthogonal to the DiSC. These 
parameters are expressed relative to the amount of across-day change along 
the DiSC (thick black arrow ina, b). The two models shown (a, c) are 
characterized by different amounts of overnight consolidation of within-day 
changes along the DiSC. a, Model 1. Within-day change is aligned with the DiSC 
(g, = 0) and is large (k,=5). The component of across-day change orthogonal to 
the DiSCis as large as the component of across-day change along it. In this 
scenario, overnight consolidation of within-day changes along the DiSC is weak 
for typical renditions (20% of change is consolidated, corresponding toa 
consolidation index of -0.8 in Fig. 1c, d). b, Repertoire dating percentiles for 
model 1, analogous to Fig. 3b. The time course of the 50th repertoire-dating 
percentile (typical renditions, red) closely reproduces the dynamics of change 
along the DiSC implied by a: within-day change along the DiSC is large (the red 
line extends over about five days) and consolidation is weak (the starting point 
on day k +1relative to day k moves by about 20% of the overall within-day 
range). c, Model 2. Within-day change has a large component orthogonal tothe 
DiSC, whereas across-day change is aligned with the DiSC. In this scenario, 
overnight consolidation of within-day changes along the DiSC is strong (80% of 
the change is consolidated; consolidation index —0.2) for typical renditions. 

d, Repertoire dating percentiles for model 2, analogous to b. The time course of 
the 50th repertoire dating percentile (typical renditions, red) closely 
reproduces the dynamics of change along the DiSC implied byc.Inb,d, 


differences between anticipations (95th percentile) and regressions (Sth 
percentile) correctly reflect the underlying model parameters 

(see Supplementary Methods). e-j, Validation of stratified behavioural 
trajectories. We generated three sets of stratified behavioural trajectories that 
differ with respect to the alignment of within-day and across-day change with 
the DiSC. We built each set of trajectories by arranging 50 points (five strata per 
day, five production time periods per day, ontwo consecutive days; same 
conventions as Fig. 3f, g) within a four-dimensional space. We then generated 
simulated stratified mixing matrices (e-g, replotted from Fig. 3f) by 
computing pairwise distances between all points, and transforming distances 
into similarities. We visualize the behavioural trajectories (h-j) with the same 
two-dimensional projections asin Fig. 3h-j, with the same scale along all 
dimensions. In all models, overnight consolidation along the DiSC is perfect 
(strong consolidation) for all strata. e, Model 1: within-day change and across- 
day change occur only along the DiSC. For each stratum (that is, each of the five 
10-by-10 squares along the diagonal), similarity decreases smoothly with time, 
reflecting the gradual progression of the trajectory along the DiSC within and 
across days. f, Model 2: within-day change has a large component that is not 
aligned with the DiSC. g, Model 3: both within-day and across-day change have 
large components that are not aligned with the DiSC. The misaligned 
component of across-day change reduces the similarity between day k and day 
k+1compared with model 2, resulting in smaller values in the 5-by-5 squares 
comparing points from day k and day k +1.h, Behavioural trajectories for 
model 1: the two-dimensional projection containing the DiSC (top) explains all 
the variance in the trajectories. i, Behavioural trajectories for model 2: similar 
toh, but points from different periods during the day are also displaced along 
an orthogonal direction of within-day change (middle).j, Behavioural 
trajectories for model 3: similar toi, but points from adjacent days are also 
displaced along an orthogonal direction of across-day change (bottom). Note 
that the models in e-j are implemented differently to the models ina-d 

(see Supplementary Methods). 
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Extended Data Fig. 5 | Repertoire dating control analyses. a-c, Within-bout 
effects, analogous to Fig. 3c, d.a, Within-bout effects computed only from 
renditions that fall into short bouts (the bout length is less than the median). 
b, Analogous toa, but computed only from renditions that fall into long bouts 
(the bout lengthis more than the median). The changes in the behavioural 
repertoire observed within a bout are qualitatively similar for short and long 
bouts (comparea, b; within-bout effects are most pronounced after day 70). In 
particular, the song becomes more regressive shortly before the end of about 
(Sth percentile, bottom curves). This suggests that the analogous effect in 
Fig. 3c, d occurs at the end ofa bout, rather than at a fixed time after the 
beginning of a bout.c, Analogous to Fig. 3c, d but computed over the entire 
dataset without prior clustering into syllables. The changes in behavioural 
repertoire differ in several respects from those in Fig. 3c, d, which were 
computed on individual syllables and then averaged across syllables 

(see Supplementary Methods). Here, the increase in regressions at the bout 
endis less pronounced. Moreover, large within-bout changes also occur for 
anticipations early in development. Both differences may reflect changes in 
the relative frequency of renditions from each syllable (for example, 
introductory notes) sung throughout a bout. Such changes in frequency can 
affect the results inc, which were computed on the unclustered data, but not 
thoseina, b. d, Within-day effects, analogous to Fig. 3a, b, but computed for 
individual syllables, and then averaged across syllables and animals. The 


changes in behavioural repertoire are qualitatively similar to those in Fig. 3a, b, 
which were computed using the unclustered data. This similarity implies that 
the dynamics along the direction of slow change in Fig. 3 cannot be explained 
by changes during the day in the relative frequency of renditions from each 
syllable. e, Analogous to Fig. 3a, b but computed after shuffling production 
times among all data points. Within-day changes of the percentile curves are 
small under this null hypothesis. The maximal span of within-day fluctuations 
is 0.2 days, compared with 3.71 for the unshuffled data in Fig. 3b. The total 
repertoire spread (Sth to 95th percentiles) is around 40 days, compared with 
around 23 days for unshuffled data. The 50th percentile curve is flat, implying 
that the shuffled data do not undergoa systematic drift over time (that is, do 
not describe a DiSC). The vertical separation between percentiles, then, 
reflects the range of production times in the data, not the spread along the 
DiSC. The time course of the Sth and 95th repertoire dating percentiles should 
thus be interpreted as the progression of regressions and anticipations along 
the DiSC only over the range of repertoire times covered by typical renditions 
(that is, approximately the vertical range of the 50th repertoire dating 
percentile). f, Analogous to Fig. 3e but for different distance metrics 
(Euclidean; correlation; Euclidean after time warping) and feature 
representations (32 acoustic features; 1 acoustic feature (entropy variance)). 
See also Extended Data Fig. 9. 
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Extended Data Fig. 6 | Behavioural change in adult versus juvenile birds. 
a-d, Comparison of within-day repertoire dating results during and after the 
end of development (averaged over three birds). Top, juvenile birds; bottom, 
same birds but as adults. a, Large-scale embeddings analogous to Fig. 2a.b, 
Repertoire dating percentiles, analogous to Fig. 3a, b.c, Stratified mixing 
matrix, analogous to Fig. 3g. d, Stratified behavioural trajectories, analogous 
to Fig. 3h-k. e, Shift and span values for the 50th percentile, for juvenile and 
adult birds. Points indicate individual birds. Song in adult birds is not static, but 
the time course of change differs from that observed injuveniles. First, change 
in adults is substantially less than injuveniles (see the slope of the SOth 
percentile in the top versus bottom parts of b). Second, the relation of fast 
(within-day) and slow (across-day) change differs in juveniles versus adults. In 


quintile 


juveniles, vocalizations move along the DiSC (y axes in b; slow local axis ind) 
within each day and the repertoire time of typical renditions increases by about 
one day from morning to evening (SOth percentile; spanis approximately one 
day) and is maintained through the next morning (shift is approximately 0 
days). In adults, typical renditions do not show within-day progress along the 
DiSC (the span is approximately O days) but change overnight across days (the 
shift is greater than O days). In adults, the regressive tail of the repertoirein 
particular moves towards smaller values during the day (b, bottom right; 5th 
percentile), whereas injuveniles it consistently moves towards larger values 
(b, top). In bothjuvenile and adult birds, within-day change has astrong 
component that is misaligned with the DiSC (within-day axis ind). 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Local linear analysis. a—e, We validated the structure of 
change inferred with nearest-neighbour statistics (Fig. 3) with an approach 
based on linear regression in the high-dimensional spectrogram space 

(see Supplementary Methods). Unlike for the case of nearest neighbour-based 
statics, here each rendition must first be assigned to a cluster (that is, asyllable; 
compare with Fig. 2a) and each cluster is analysed separately. a, Illustration of 
the linearization scheme. First, we infer the (local) DiSC on days kandk+1(grey 
arrow) as the vector of linear-regression coefficients relating production day 
to variability of renditions from days k-1and k+2.Second, we infer the 
direction of within-day change (green arrow) as the linear-regression 
coefficients relating the period within a day to variability of renditions from 
days kand k +1, orthogonalized to the DiSC. Third, we infer the direction of 
across-day change (orange arrow) as the linear-regression coefficients relating 
production day to variability of renditions from days k and k +1, orthogonalized 
to the DiSC and within-day change. All three sets of coefficients, and the 
corresponding directions in spectrogram space, typically vary across days, 
syllables and birds. The progression of song along the DiSC and along the 
(orthogonalized) directions of within-day and across-day change are obtained 
by projecting renditions on day kandk+1onto the corresponding directions. 
b, Example rendition of syllable b as in Fig. 1 (top, encapsulated by red lines) and 
inferred coefficients (directions in spectrogram space; bottom) for day k=57. 
Bright and dark shades of grey mark spectrogram bins for which power 
increases or decreases, respectively, over the corresponding timescales ina. 

c, Dependency of cross-validated regression quality (fraction of variance 
explained; y axis) on the regularization constant (A) for the estimation of the 
DiSC. One regularization constant was chosen for each syllable and the 
direction based on maximizing the leave-one-out cross-validation error onthe 
training set. d, Progression of syllable b along the directions of change shown 
inb, during days 57 and 58. Renditions from each day are binned into ten 
consecutive periods on the basis of production time within the day (analogous 
to the ten periods in Fig. 3a, b; curves and error bars represent means and 95% 
bootstrapped confidence intervals). For simplicity of visualization, the time 
elapsed (x axis) during the night between days kand k + lis not shown toscale. 
The position along the DiSC for the morning of day k+ 1is close to that for the 


evening of day k, indicating overall strong consolidation (left). The position 
along the direction of within-day change is reset overnight, implying that the 
underlying changes are not consolidated (middle). The position along the 
direction of across-day change jumps overnight, consistent with offline 
learning (right). We note that strong consolidation, weak consolidation and 
offline learning have all been reported previously, albeit in different 
behaviours and species?**5?3*, The charts ind show that these different 
patterns of change can occur inthe very same syllable along distinct spectral 
features (see also Fig. 1h and Extended Data Fig. 8). By considering features 
with different projections onto these directions, a wide range of consolidation 
patterns can be uncovered (see also Fig. 1h).e, As for d, but averaged across all 
four-day windows during days 60-69 and over all syllables and birds (same five 
birds as in Figs. 2,3). The resulting averages include contributions from the 
entire behavioural repertoire, including regressions, typical renditions and 
anticipations. The two right-most panels show concurrent progression along 
the DiSC and the direction of within-day or across-day change, combining data 
from the first and second, or first and third, panels ine. These representations 
are analogous, and in qualitative agreement, with the behavioural trajectories 
in Fig. 3h-k (typical). f, Analogous toe, but computed on vocalizations 
represented by 32 acoustic features instead of spectrograms. Directions asine 
can be retrieved, but progression along the DiSC appears noisier, suggesting 
that the 32 acoustic features do not fully capture in particular the slow spectral 
changes occurring over development (see also Extended Data Fig. 9). g,h, 
Contribution of individual acoustic features to the directions of slow, within- 
day and across-day change. As inf, the directions are computed inthe space of 
32 acoustic features. g, Distribution of coefficients in the retrieved 
orthonormalized directions. Thick and thin black bars represent means and 
95% confidence intervals; crosses show outliers; thin vertical lines represent 
medians. h, Means (solid lines) and medians (dotted lines) of the signed (left) or 
unsigned (right) distributions ing. Most coefficients are small and variable, 
indicating that the alignment between any of the 32 acoustic features and the 
inferred directions of change is weak and highly variable over time, syllables 
and birds. 
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Extended Data Fig. 8 | Behavioural variability and stratificationinan 
example syllable. a, Songs of an example bird for three days during 
development. Only those spectrogram segments that belong toa particular 
syllable and location in the motif (68-ms window of interest; red dotted lines) 
are analysed in the subsequent panels. b, Developmental changes over the 
course of weeks. Renditions are binned by production day, and averaged. The 
most apparent changes are an increase in pitch and the later successive 
appearance of additional spectral lines at low frequencies. c, Within-day and 
across-day changes for days 60-69. Renditions are binned into five production- 
time periods spanning a day and averaged within bins. On many days, the 
changes within a day donot appear to recapitulate the changes occurring 
across days (for example, days 60 and 65; within-day progression does not 
smoothly transition between the vocalizations on preceding and subsequent 
days; see b). The averages also reveal occasional overnight ‘jumps’ inthe 
properties of the vocalizations (see, for example, the vertical black arrows). 
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d, Comparison of within-day change and change on longer timescales. 
Renditions within each period and day were split into strata according to their 
repertoire times (for example, the quintiles in Fig. 2d), resultingin 25 averages, 
one for each combination of stratum and period within the day. Only the upper 
part of the spectrogram is shown (red rectangle inc). The progression along 
strata (x axis) emphasizes the large extent of motor variability along the DiSC 
existing within a single day (day 62). e, Same averages as ind, but withxand 
yaxes swapped. In particular for regressive renditions (quintile 1), change 
within day 62 (xaxis) does not recapitulate developmental changes occurring 
over months (xaxis ind). f, Repertoire dating based on repertoire time (asin 
Extended Data Fig. 3c). Each point corresponds toa production-time period 
and the average of all repertoire times of renditions in that period. Error bars 
show bootstrapped 95% confidence intervals. The change in repertoire time, 
whichis computed without using a low-dimensional parametrization of 
vocalizations captures the movement along the DiSC. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Behavioural change based on alternative distance 
metrics and features. To demonstrate the robustness of the proposed nearest- 
neighbour statistics, we verified that the inferred time course of behavioural 
change is reproduceable using anumber of different distance metrics (used to 
define nearest neighbours) and parameterizations of vocalizations. a-d, We 
recomputed the main analyses using a Pearson’s correlation metric on 68-ms 
onset-aligned spectrogram segments (first row); and the Euclidean distance on 
onset-to-offset spectrogram segments that were linearly time-warpedtoa 
duration of 100 ms (second row). For comparison, the main analyses in the text 
were based on Euclidean distance on 68-ms onset-aligned spectrogram 
segments (for example, Fig. 2c-f, 3). a, SNE visualization based on the 
corresponding distance metrics and sound representation for the example 
bird, analogous to Fig. 2a. b, Repertoire dating averaged over birds, analogous 
to Fig. 3a, b.c, Stratified mixing matrices averaged over birds, analogous to 

Fig. 3g. The mixing values are highly correlated across distance metrics: 
Euclidean (main text) versus correlation, variance explained = 92%; Euclidean 
(main text) versus time-warped Euclidean, 93%. d, Stratified behavioural 
trajectories based onc, analogous to Fig. 3h-k. The results in a-dare 
consistent with those in Fig. 3, showing that our findings are robust with 
respect to the exact definition of nearest neighbours. Moreover, the overall 
structure of the behavioural trajectory appears to depend only minimally on 
changes intempo and spectrogram magnitude (first row: Pearson’s correlation 
is invariant to changes in overall magnitude of vocalizations; second row: time- 
warped Euclidean distance is invariant to changes intempo). e-h, We 
recomputed all main analyses with four additional parameterizations of 
vocalizations: time-dependent normalized acoustic feature traces for 16 
acoustic features within 68-ms windows after syllable onset (first row); means 
and variances of the same 16 acoustic features over entire syllables (second 
row); means and variances of 8 of the 16 acoustic features (third row); anda one- 
dimensional parametrization consisting solely of entropy variance computed 
over entire syllables (fourth row). Feature means and variances were z-scored 
across all syllables. For all of these parameterizations we defined nearest 


neighbours with the Euclidean distance. e, Embedding using t-SNE based onthe 
corresponding parameterization and metric. For entropy variance alone, the 
embedding appears locally one dimensional (for visibility, data points are 
larger than for the other parameterizations). Entropy variance maps mostly 
smoothly onto this one-dimensional manifold (data not shown). f, Repertoire 
dating averaged over birds, analogous to Fig. 3a, b. Repertoire dating based on 
entropy variance alone fails to reproduce most of the results in Fig. 3 obtained 
with spectrogram segments. The percentile curves are almost flat, indicating 
that renditions cannot be reliably assigned to their production times onthe 
basis of entropy variance alone. In this case, vertical separation between 
percentiles cannot be interpreted as spread along the DiSC (see Extended Data 
Fig. Se). For entropy variance alone, spanis greater than zero across all 
percentiles, but consolidation is consistently close to zero. g, Stratified mixing 
matrix averaged over birds, analogous to Fig. 3g. The match with the mixing 
matrix in Fig. 3g decreases as the dimensionality of the parameterization is 
reduced (spectrogram versus time-dependent feature traces: variance 
explained = 93%; spectrogram versus 16 acoustic feature means and variances, 
91%; spectrogram versus 8 acoustic feature means and variances, 84%; 
spectrogram versus entropy variance, 54%). h, Stratified behavioural 
trajectories based ong, asin Fig. 3h-k. The inferred behavioural trajectories 
are similar across the first three song parameterizations. However, these 
alternative parameterizations result in more vertical separation between 
percentiles inf, suggesting that they capture the direction of slow change less 
well (compare with Fig. 3a and Extended Data Fig. 5e). Parameterizations of 
reduced dimensionality also result in progressively less defined syllable 
clusters in the embeddings (e, top to bottom). These observations suggest that 
a parameterization based on the full spectrogram is better suited to capture 
the different directions of change explored during development (see also 
Extended Data Fig. 7). Note that for entropy variance (bottom row), the 
projections onto the local direction of slow change are highly magnified 
compared with the projections inthe top panels. 
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Extended Data Fig. 10 | Behavioural change based onrandom spectrogram 
segments. We recomputed all main-text analyses with a random segmentation 
of behaviour that does not require alignment to syllable onsets. This 
segmentation scheme can be applied to behaviour that does not fall into 
temporally discrete elements. Here each data point corresponds toarandomly 
chosen 68-ms spectrogram snippet drawn froma period of singing. Not all 
song was sampled, as we used 1,000,000 non-overlapping segments for each 
bird (see Supplementary Methods). a, Vocalizations of the example bird (as in 
Fig. 1a) from day 76, with example segments used for the analysis (at the top). 

b, t-SNE visualization for random segments from the example bird, based on 
nearest neighbours defined with respect to the Euclidean distance (left) and 
average spectrograms for different locations in the (¢-SNE) embedding (right; 
analogous to Fig. 2b). Clusters corresponding to individual syllables are 
elongated compared with Fig. 2a. Variation along one direction within the 
cluster tends to account for production time (colour bar), while variation along 
another direction tends to reflect the timing of segments relative to syllable 
onsets. c, Embedding from Fig. 2a (bottom) and embedding of random 68-ms 
segments (top). Points in both embeddings are coloured according to cluster 
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68ms 
random snippets 


Dako 


onset aligned 
whole syllables 


identity defined on onset-aligned spectrogram segments covering entire 
syllables (bottom). The colour of each point corresponding toarandom 
snippet (top) corresponds to the cluster identity of the surrounding syllable. 
Some clusters in the embedding based on random segments contain points 
assigned to two different syllables (for example, black versus green colours). 
d, Repertoire dating averaged over birds, analogous to Fig. 3a, b.e, Stratified 
mixing matrix averaged over birds, analogous to Fig. 3g. The mixing values are 
highly correlated with those in Fig. 3g (variance explained = 89%). f, Stratified 
behavioural trajectories based one, asin Fig. 3h-k. The results in d-flargely 
reproduce the corresponding findings obtained with onset-aligned 68-ms 
spectrogram segments (Fig. 3) as well as with other song parameterizations 
(Extended Data Fig. 9). Nonetheless, the overall effect sizes are reduced, 
probably because of the additional variability introduced by the random 
position of segments relative to syllable onsets. Ind, the vertical separation 
between the Sthand 95th percentiles is increased and the slope of 50th 
percentile is reduced compared with the main-text analyses (Fig. 3a), 
suggesting a noisier representation of the direction of slow change (see 
Extended Data Fig. Se) compared with onset-aligned 68-ms segments (Fig. 3). 
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The hippocampus is an important part of the limbic system in the human brain that 
has essential roles in spatial navigation and the consolidation of information from 
short-term memory to long-term memory”. Here we use single-cell RNA sequencing 
and assay for transposase-accessible chromatin using sequencing (ATAC-seq) 
analysis to illustrate the cell types, cell linage, molecular features and transcriptional 
regulation of the developing human hippocampus. Using the transcriptomes of 
30,416 cells from the human hippocampus at gestational weeks 16-27, we identify 

47 cell subtypes and their developmental trajectories. We also identify the migrating 
paths and cell lineages of PAX6* and HOPX* hippocampal progenitors, and regional 
markers of CAI, CA3 and dentate gyrus neurons. Multiomic data have uncovered 
transcriptional regulatory networks of the dentate gyrus marker PROX1. We also 
illustrate spatially specific gene expression in the developing human prefrontal cortex 
and hippocampus. The molecular features of the human hippocampus at gestational 
weeks 16-20 are similar to those of the mouse at postnatal days 0-5 and reveal gene 
expression differences between the two species. Transient expression of the primate- 


specific gene NBPF1 leads to a marked increase in PROXI' cells inthe mouse 
hippocampus. These data provides a blueprint for understanding human 
hippocampal development and a tool for investigating related diseases. 


The hippocampal formation (hippocampus) is acompound structure 
under the cerebral cortex in primates that forms and stores long-term 
memory by consolidating information from short-term memory, and 
also processes spatial information and navigation’. 


Hippocampus single-cell transcriptome 


To understand the molecular features of hippocampal cells during 
human brain development, we analysed 30,416 cells from the entire 
left hippocampus (including the hippocampus proper, the dentate 
gyrus (DG) and some of the subiculum connected to the hippocampus 
proper) at gestational weeks (GW) 16-27 (Supplementary Table 1) by 
droplet-based single-cell RNA sequencing (scRNA-seq). We performed 
t-distributed stochastic neighbour embedding (¢-SNE) analysis and 
identified cells as progenitors, excitatory neurons (EXxN), inhibitory 
neurons (InN), Cajal Retzius cells, astrocytes, oligodendrocyte pro- 
genitor cells (OPCs), oligodendrocytes, microglia and endothelial 
cells by using classic markers and gene ontology (GO) of differentially 
expressed genes (DEGs) (Fig. la—c, Extended Data Fig. la—c). The distri- 
butions of samples from two individuals at GW22 were similar on the 


t-SNE plot (Extended Data Fig. 1d). We then used the DG marker PROX1 
to subclassify the EXN as DG ExN or non-DG EXN. The InN were further 
subclassified as being derived from the medial or caudal ganglionic 
eminence (MGE or CGE) on the basis of LHX6 and NR2F2 expression 
(Fig. la-d). PROX1 is an essential transcription factor for the genesis 
of hippocampal granule cells and formation of the DG**. By search- 
ing transcription factor motifs identified from ATAC-seq peaks close 
to the PROXI transcription start site (TSS), we found three potential 
binding sites for LEF1 or TCF4, indicating that WNT signals are crucial 
for the production of DG granule cells (Fig. le, f), which is consistent 
with reported studies**. We further segregated cells into 47 distinct 
hierarchical subtypes by principal component analysis (PCA), show- 
ing that different subtypes of progenitors were highly correlated with 
fate-determined cells (Fig. 1g, Extended Data Fig. 2a-—c). 

To study developmental differences between the hippocampus 
and neocortex, we compared the transcriptome of the hippocampus 
(GW16-27) with that of the human prefrontal cortex (PFC) (GW8-26)’ 
(Fig. 1h) and found differences in gene expression between the PFC 
and hippocampus across all cell types (Fig. 1i, Supplementary Table 2). 
The HMG box domain-containing protein TOX was highly expressed 
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Fig. 1| Molecular diversity of single cells from the developing human 
hippocampus. a-c, Visualization of eleven major classes using ¢-SNE in 3D (a) 
and 2D (b) visualization. c, Expression of known markers. HP, hippocampus; 
PFC, prefrontal cortex. Dots, individual cells; grey, no expression; red, relative 
expression (log-normalized gene expression). d, Immunostaining of MEIS2 and 
PROX1. Scale bar, 500 um. e, Normalized ATAC-seq profiles of PROX1 in GW25 
hippocampus show the activation of PROX1. Amplified view (pink) shows 
predicted LEFland TCF4 binding sites. f, LEF1 and TCF4 binding motifs are 
identified in the ATAC-seq peaks close to the PROXITSS. g, Hierarchical 
clustering analysis of 47 subclasses. AC, astrocyte; P, progenitor; CR, Cajal- 
Retzius cell; M, microglia; EC, endothelial cell. n =134, 141, 95, 275, 58, 300, 397, 
159, 204, 483, 101, 74, 670, 1,019, 1,765, 2,334, 793, 1,073, 909, 3,189, 2,347, 92, 


in the PFC, whose progenitors are regulated by HMGA2”*. SOX4 and 
SOXI11, two SOXC transcription factors that are required for neuronal 
differentiation during neurogenesis in the adult hippocampus”, were 
relatively highly expressed in the hippocampus (Fig. 1i). GO analysis of 
DEGs between ExN from the hippocampus and PFC at GW16 indicate 
that hippocampal ExN may undergo synapse organization and axono- 
genesis at GW16 (Extended Data Fig. 2d). Comparison of the matu- 
ration trajectories of hippocampus and PFC neurons indicated that 
hippocampus non-DG ExN were more mature than PFC ExN, whereas 
maturation of DG ExN was similar to that of PFC ExN (Fig. 1j). InN of the 
PFC and hippocampus generally showed a similar maturation status, 
whereas MGE-derived InN were more mature than CGE-derived InNin 
the hippocampus (Fig. 1k). Consistent with transcriptome analysis, 
immunofluorescence staining for OLIG2 and MBP showed anumber of 
MBP* cells in the subfield of the hippocampus, whereas no MBP* cells 
were found inthe human PFC at GWI16 (Fig. 11, m, Extended Data Fig. 2e), 
suggesting that oligodendrocytes may be involved the maturation of 
hippocampal neurons during early development. 


Progenitors of the developing hippocampus 

To further investigate cellular lineage relationships inthe fetal human 
hippocampus, we reconstructed five developmental paths by monocle 
analysis without microglia and endothelial cells (Fig. 2a). Three major 
subgroups of progenitors differentiated to excitatory neuronal, OPC 
and oligodendrocyte or astrocyte lineages. The MGE- and CGE-derived 
InN were separated in different directions, which is consistent with pre- 
vious studies showing that hippocampal InN originate from different 
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1,838, 717, 1,956, 730, 1,192, 2,573, 54, 259, 84,84, 44, 489, 465, 246, 68,229, 638, 
540, 131, 103, 139, 257, 227, 459 and 282 cells, top to bottom. h, Abstracted graph 
shows the connections on the transcriptome between different cell typesinthe 
developing human hippocampus and PFC. i, Scatterplot of all genes for 
correlation with conserved differentiation network across PFC and 
hippocampus. Blue plot shows genes related to PFC; red plot shows genes 
related to hippocampus. j,k, Maturation scores of excitatory neurons (j) and 
inhibitory neurons (k) in PFC and hippocampus. I, m, Immunostaining for 
oligodendrocyte markers at GW16 in human hippocampus and prefrontal 
cortex. Scale bars, 500 pm (I, left); 100 pm (1, right, m). The experiment was 
repeated three times independently with similar results. 


progenitors located in the ganglionic eminence". To further reveal the 
diversity and molecular properties of human hippocampal progeni- 
tors, we used GO analysis of DEGs and marker genes to identify eight 
subclusters (Extended Data Fig. 3a—c). FOMES*, MEIS2° and NEUROD2* 
progenitors were in clusters P3 and P4, indicative of ExN generation 
(Extended Data Fig. 3a). AQP4, OLIGI/OLIG2 and PDGFRA were highly 
expressed in clusters P5, P6 and P7, respectively, indicative of astrocyte 
and oligodendrocyte cell fates. Cluster P8 contained a small number 
of progenitors that highly expressed DLX1 and DLX2, indicating that 
these cells may differentiate as InN (Extended Data Fig. 3a-f). 

To understand how progenitors develop into neuronal and glial cells, 
we carried out trajectory analysis (Fig. 2b) and separated three paths 
towards neurons, astrocytes and oligodendrocytes. Notably, PAX6* and 
HOPX* progenitors, which are considered as neurogenic progenitors in 
the neocortex”, were likely to contribute to both neurogenesis and glio- 
genesis inthe human hippocampus (Fig. 2b, Extended Data Fig. 3g, h). 
We next examined the locations of cells expressing PAX6 or HOPX 
by immunofluorescence staining (Fig. 2c-f). At GW11, the primordial 
hippocampal area, located adjacent to the cortical hem (CH), was 
composed of the dentate neuroepithelium (DNE) and ammonic neu- 
roepithelium (ANE). The majority of cells inthe DNE and ANE expressed 
SOX2, and PAX6* SOX2* progenitors of ANE started to migrate (Fig. 2c, 
Extended Data Fig. 4a). At the same time, HOPX’ SOX2* DNE progenitors 
also indicated migration potential (Fig. 2d, Extended Data Fig. 4b). As 
the hippocampus developed at GW14, anumber of PAX6* progenitors 
migrated away from the ventricular zone towards the future DG (PROX1* 
region, Fig. 2e, Extended Data Fig. 4c), forming the primary matrix (1) 
and secondary matrix (II). HOPX’ progenitors also migrated inthe same 
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Fig. 2| Molecular signature of neural progenitor cells of the developing 
human hippocampus. a, Cell lineage relationships of all cells analysed except 
for microglia and endothelial cells in developing human hippocampus. 
Monocle recovered a branched single-cell trajectory beginning with 
progenitors and terminating at excitatory neurons, inhibitory neurons, 
astrocytes and oligodendrocytes. b, Cell lineage relationships of progenitors, 
excitatory neurons, astrocytes and oligodendrocytes in developing human 
hippocampus. Known gene expression is shown below. Arrows show the 
directions of lineages. c, Immunofluorescence images of PROXI(scale bar, 
200 um), PAX6 and SOX2 at GW11. Scale bars, 500 pm (left), 200 pm (top right), 
100 pm (bottom right). I, primary matrix. d, Immunofluorescence images of 


direction but closer to the pial side (Fig. 2e, Extended Data Fig. 4c). 
At GWI16, the migration of PAX6* and HOPX* progenitors continued 
and many cells arrived at the hilus and formed an origin hub of DG 
cells, called the tertiary matrix (III). Notably, PAX6* progenitors were 
located outside HOPX’ progenitors while migrating (Fig. 2f, Extended 
Data Fig. 4d-f). PAX6* progenitors were still abundant, and some were 
located in the blades of the DG, but only some HOPX’ progenitors were 
found inthe hilus; the majority of HOPX’ progenitors were in the cornu 
ammonis (CA) at GW 22 (Fig. 2f, Extended Data Fig. 4g-i). 

We next evaluated the proliferation capacity and cell fate of PAX6* 
and HOPX* progenitors. Both scRNA-seq data and immunostaining 
indicate that a subpopulation of PAX6* and HOPX* progenitors are 
active in the cell cycle even at the mid-gestational stage (Extended Data 
Fig. 5a—d). Incell fate assessment, we observed PAX6* NEURODI' cells 
inthe CAand DG, but PAX6* GFAP’ cells only in the CA (Fig. 2g, Extended 
Data Fig. 5e-g). Similar expression patterns were found in HOXP* cells 
(Fig. 2h, Extended Data Fig. 5h, i). Next, we evaluated the maturation 
status of PAX6* or HOPX’ progenitors and found that NEURODI' cells 
were more mature than GFAP cells, suggesting that they may have been 
born earlier (Extended Data Fig. 5j, k). Together, our data suggest that 
although the origins and migrating paths of PAX6* and HOPX* progeni- 
tors differ, they both contribute to neural and glial genesis in a spati- 
otemporal manner in the developing human hippocampus (Fig. 2i). 


Neurons in developing hippocampus 

To further investigate the developmental characteristics of hippocam- 
pal neurons, we subclassified all the excitatory neurons into seven 
groups by PCA (Fig. 3a, b). Excitatory neurons from CAI, CA3 and DG 
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HOPX and SOX2 at GW11. Scale bars, 500 pm (left), 200 pm (top right), 100 pm 
(bottom right). e, Immunofluorescence images of PROX1 (scale bars, 1,000 pm, 
inset 500 pm), PAX6, HOPX and SOX2 in GW14. Scale bar, 200 ppm. II, secondary 
matrix. f, Immunofluorescence images of PROX1 (scale bar, 1,000 tum), PAX6, 
HOPX and SOX2 at GW16 (top) and GW22 (bottom). Scale bar, 500 pm. III, 
tertiary matrix. g,h, Immunofluorescence images of PAX6, HOPX, NEUROD1 
and GFAP at GW2S. Scale bars, 500 um (left); 100 pm (right). c-h, The 
experiments were repeated three times independently with similar results. 

i, Schema depicting locations of PAX6* or HOPX' progenitors in developing 
human hippocampus from GW11to GW22. Arrows indicate direction of 
migration. 


were grouped as ExNO1-03 (Fig. 3b, Extended Data Fig. 6a). SEMASA and 
PID1were selected as marker genes for DG and CA1, respectively, while 
SULF2 and NRIP3 were considered as CA3 markers (Fig. 3c, Extended 
Data Fig. 6b). Consistent with progenitor migration paths, the matura- 
tion analysis suggests that CA1 neurons were more mature than CA3 and 
DG neurons (Fig. 3d). Excitatory neurons were categorized into three 
groups according to their developmental stage, and GO analysis of DEGs 
indicates that neurogenesis is the major event at GW16-18, followed 
by axonogenesis (GW20-22) and function development (GW25-27) 
(Fig. 3e-g). To further analyse the transcriptional regulation of DG 
formation, we selected the subclusters of highly variable genes and 
clustered them into nine modules by weighted gene coexpression 
network analysis (WGCNA) (Extended Data Fig. 6c, d). The green mod- 
ule includes PROX, suggesting that the genes in this module may be 
correlated with DG development (Fig. 3h). When we analysed ATAC-seq 
data for the hippocampus at GW25, we found PROXI motifs in ATAC 
peaks close to the TSSs of several genes, including KCN/6, NFIA, DUSP1 
and NPTX2, which are also in the green module (Fig. 3i,j). Among these 
genes, KCN/6 (also known as G/RK2) encodes a member of the G-protein- 
activated inwardly rectifying K* channels that is widely abundant inthe 
brainand has been implicated in learning and memory, reward, motor 
coordination, and other functions”. 

Hippocampal inhibitory neurons arise from MGE and CGE precur- 
sors. Notably, monocle analysis suggested that the majority of MGE- 
derived InN (LHX6*) and CGE-derived InN (NR2F1/2") were separated 
(Fig. 3k, Extended Data Fig. 7a-c). The pseudo-time analysis demon- 
strated that InN expressing CCK, CALB2 and V/P accumulated in the 
CGE differentiation path, and the majority of SATBI’ and SST‘ neurons 
were in the MGE path (Fig. 31, Extended Data Fig. 7b, c). Additionally, 
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Fig. 3 | Dynamics of neurogenesis in the developing human hippocampus. 
a, Visualization of seven subtypes of excitatory neuron in the developing 
human hippocampus using t-SNE. Sample sizes of clusters: 2,573, 2,347, 1,838, 
1,956, 1,192, 717, 92 cells. b, Heat map showing the expression level and identity 
of genes in the excitatory neurons subclasses. Top, distribution of each 
subclass by gestational week. c, In situ hybridization of region-specific genes in 
DG, CAland CA3 at GW27. Scale bar, 600 um. The experiment was repeated 
three times independently with similar results. d, Maturation scores of seven 
subtypes of excitatory neuron show that CAl neurons are more mature than 
CA3 and DGneurons. e-g, The enriched gene ontology terms show the cell 
properties of the hippocampusat different weeks. Sample sizes: 4,912 cells (e); 


we found genes that may regulate cell fate determination at the first 
branch point (Extended Data Fig. 7d, e). Microglia, the immune cells 
inthe CNS, originate from the mesoderm”. We classified microglia 
into 11subclusters and observed that M9 contained microgliain active 
cell cycles from all developing stages (Extended Data Fig. 8a-d). The 
immunostaining images also indicated proliferating microglia at GW25 
(Extended Data Fig. 8e). 


Evolution signatures of developing hippocampus 

Although the hippocampus is considered an evolutionarily conserved 
part of the brain, transcriptomic correlation coefficient analysis illus- 
trated that the developmental timing of the human hippocampus from 
GWI16 to 20 was similar to that at PO-5 in mice’”* (Fig. 4a), suggesting 
that the human embryonic hippocampal development occurs earlier 
but lasts for longer than in mice. We also found DEGs in the human 
hippocampus, some of which are primate-specific, including STX10, 
CHMP4A, BEXS, NBPF1 and the long non-coding RNA CASCIS (Fig. 4b, 
c). In situ images and ATAC-seq data identified the mRNA localiza- 
tion andtranscription regulatory sites of these genes (Fig. 4c). Genes 
of the neuroblastoma breakpoint family (NBPF) contain a repeated 
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4,164 cells (f); 1,639 cells (g).h, The social network Cytoscape graph depicts the 
gene network regulation of excitatory neurons. i,j, Motifs of PROX] (i) and the 
normalized ATAC-seq profile of downstream genes of PROX (j) in GW25 
hippocampus with three independent biological replicates. k, Cell lineage 
relationships of progenitors and inhibitory neurons analysed in developing 
human hippocampus. Monocle recovered a branched single-cell trajectory 
beginning with progenitors and terminating at subgroups of inhibitory 
neurons. I, Markers were ordered by Monocle analysis in pseudo-time. Line 
with blue shading represents inhibitory neurons derived from CGE; pink 
shading represents inhibitory neurons derived from MGE. 


domain called DUF1220, the copy number of whichis related to brain 
evolution and complexity”. Several NBPF family genes are expressed 
in hippocampal cells, and the expression of NBPF1 was relatively high 
and general inall cell types (Fig. 4c, Extended Data Fig. 9a). NBPF1 with 
eight DUF1220 domains exists only in primates, and in particular in spe- 
cies that are evolutionally close to humans (Extended Data Fig. 9b, c). 
To further investigate its role in hippocampal development, we 
transiently expressed NBPF1 in the mouse primordial hippocampal 
area at embryonic day 13.5 (E13.5) and observed that these mice had 
more PROXI' cells and an enlarged PROXI' area at E15.5 and E18.5 
when compared with control mice (Fig. 4d-g, Extended Data Fig. 9d-f). 
To understand how NBPF1 regulates hippocampal development, we 
collected single GFP* cells (Extended Data Fig. 9g). LHX2 has been 
considered as an essential gene in the hippocampal primordium to 
regulate hippocampal neuronal development”. Single-cell quanti- 
tative RT-PCR results indicated that LHX2 expression was higher in 
NBPF1-GFP* cells (Extended Data Fig. 9h). Further analysis of open 
chromatin areas close to the PROXI TSS revealed three potential 
sites for LHX2 binding (Extended Data Fig. 9i), indicating a possible 
molecular mechanism by which NBPF1 may regulate hippocampal 
development via LHX2. 
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Fig. 4 | Specific genes expressed in the human developing hippocampus. 

a, Heat map showing correlation of different stages of hippocampus 
development in humanand mouse. The developing human hippocampus is 
similar to the developing mouse hippocampus at PO-5. b, Heat map showing 
DEGs inhuman and mouse hippocampus. c, Expression of human-specific 
genes in¢-SNE plots (left). Right, in situ hybridization at GW25; bottom, 
normalized ATAC-seq profile of human-specific genes in hippocampus in GW25 


Discussion 


We have systematically analysed scRNA-seq and ATAC-seq data toiden- 
tify cell type diversities, gene expression trajectories, transcription 
regulation networks and signal transduction pathways in the develop- 
ing human hippocampus. The hippocampus starts to form from the 
hippocampal primordium in response to bone morphogenetic protein 
(BMP) and WNT secreted by the CH”. An open chromatin area close 
to the PROX1TSS contains the binding motif for LEF1 and TCF4, two 
transcription factors that are involved in the WNT signalling pathway 
by recruiting the coactivator beta-catenin to enhancer elements of 
targeting genes’, indicating that WNT signals not only initiate differen- 
tiation of the medial pallium to the hippocampus, but also contribute 
to subregional patterning of the hippocampus. The adult neural stem 
cells located in the subgranular zone give rise to granule cells through- 
out adult life in most mammals”. WNT signalling also helps to regulate 
granule cell genesis and neural activity in adult mammals’, indicating 
that the key gene regulation may be conserved in embryonic and adult 
neurogenesis in the hippocampal DG. 

HOPX has been recently identified as a gene that is expressed by 
dentate precursors and contributes to embryonic and postnatal neu- 
rogenesis in mice”. Another unbiased single-cell RNA-seq analysis 
has indicated that perinatal, postnatal, and adult neurogenesis in the 
mouse DGare fundamentally similar’. Notably, clonal lineage-tracing 
of HOPX* cells in mice showed that these precursors generate neurons 
located in the DG or CA”. Consistently, we found that at GW11, although 
most HOPX* progenitors are located in the DNE, a subset of HOPX* 
progenitors is found inthe ANE, indicating that HOPX’ progenitors in 
different locations may have different cell fates. 

The copy number of the DUF1220 protein domain in the genome is 
correlated with the evolutionary proximity of the species to humans as 
well as with brain size, cognitive capability, and severity of autism” ~”. 
Major copies of human DUF1220 domains are encoded by the NBPF gene 
family. Microarray data from the Allen Brain Atlas suggest that NBPF1 
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with three independent biological replicates. Scale bar, 300 pm. 

d,e, Overexpression of NBPF1 promotes DG formation at E13.5, observed at 
E15.5 (d) and E18.5 (e) in mouse. Scale bars, 500 pm (d, left), 100 pm (d, right), 
200 um (e). f,g, Percentage of PROXI' cells among GFP’ cells. f, E13.5-E15.5: 
**P= 0.0049, two-sided t-test. n=6,5 brain slices per experiment; mean+s.d. 
g, E13.5-E18.5:**P=0.0015.n=6, 5 brain slices per experiment. IUE, inutero 
electroporation. 


expression decreases when the human brain develops (http://www. 
brainspan.org). LHX2 is expressed inthe dorsal and medial pallium but 
not inthe CH, which secretes WNT ligands and functions as an organizer 
that is necessary and sufficient to induce the hippocampus“. Notably, 
expression of NBPF1 upregulates LHX2 expression and increases the 
number of hippocampal PROX1* granule cells inthe developing mouse 
brain. However, the detailed molecular mechanisms of this process 
need further investigation. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Tissue sample collection 

The de-identified human tissue collection and research protocols were 
approved by the Reproductive Study Ethics Committee of Beijing Anz- 
hen Hospital and the institutional review board (ethics committee) 
of the Institute of Biophysics. The informed consent was designed as 
recommended by the ISSCR guidelines for fetal tissue donation and 
fetal tissue samples were collected after the donor patients signing 
an informed consent document that was in strict observance of the 
legal and institutional ethical regulations for samples from elective 
pregnancy terminations at Beijing Anzhen Hospital, Capital Medical 
University. All samples used in these studies had not been involved in 
any other procedures. All the protocols were in compliance with the 
Interim Measures for the Administration of Human Genetic Resources, 
administered by the Ministry of Science and Technology of China. 


Animals 

Timed pregnant female mice at embryonic day 13.5 were used for 
in utero electroporation experiments. Embryos for experiments after 
in utero electroporation included both male and female mice. Mouse 
housing and experimental protocols in this study were in compliance 
with the guidelines of the Institutional Animal Care and Use Committee 
of the Institute of Biophysics, CAS. All mice had free access to food and 
water and were housed in the institutional animal care facility with a 
12-h light-dark schedule. 


Tissue sample dissection 

Gestational age was measured in weeks from the first day of the wom- 
an’s last menstrual cycle to the sample collecting date. Fetal brains 
were collected in ice-cold artificial cerebrospinal fluid containing 
125.0 mM NaCl, 26.0 mM NaHCO,, 2.5 mM KCI, 2.0mM CaCl, 1.0 mM 
MgCl,, 1.25 mM NaH,PO, at a pH of 7.4 when oxygenated (95% O, and 5% 
CO,). The hippocampus was dissected and putin hibernate Emedium 
(Invitrogen, Cat. A1247601). The hippocampus tissue was first digested 
in 2 mg/ml collagenase IV (Gibco, Cat. 17104-019) and 10 U/l DNase I 
(NEB, Cat. MO303L) in hibernate Emedium and thenin1 mg/ml papain 
(Sigma, Cat. P4762) and 10 U/pl DNase lin hibernate Emedium. Samples 
were vortexed at 300g and 37 °C onathermocycler for 20 min. Further 
pipetting was used to fully digest the tissue into single cells. After that, 
the cell suspension was centrifuged at 700g for 5 min to obtain the cell 
pellet. The digestion medium was carefully removed and the cell pellet 
was resuspended in 300 ul 0.04% BSA in PBS and kept on ice. 


RNA library preparation for high-throughput sequencing 

Thousands of cells were partitioned into nanolitre-scale Gel Bead- 
In-EMulsions (GEMs) using 10x GemCode Technology, where cDNA 
produced from the same cell shares acommon 10x Barcode. Upon dis- 
solution of the single cell 3’ gel bead in a GEM, primers containing an 
Illumina R1 sequence (read1 sequencing primer), a16-bp 10x Barcode, a 
10-bp randomer anda poly-dT primer sequence were released and mixed 
with cell lysate and Master Mix. After incubation of the GEMs, barcoded, 
full-length cDNA from poly-adenylated mRNA was generated. Then 
the GEMs were broken and silane magnetic beads were used to remove 
leftover biochemical reagents and primers. Prior to library construc- 
tion, enzymatic fragmentation and size selection were used to optimize 
the cDNA amplicon size. P5, P7, a sample index and R2 (read 2 primer 
sequence) were added to each selected cDNA during end repair and 
adaptor ligation. P5 and P7 primers were used in Illumina bridge ampli- 
fication of the cDNA (http://lOxgenomics.com). Finally, the library was 
sequenced into 150-bp paired-end reads using the Illumina HiSeq4000. 


Data processing of scCRNA-seq from Chromium system 

Cell ranger 2.0.1 (http://10xgenomics.com) was used to perform quality 
control and read counting of Ensemble genes with default parameters 
(v2.0.1) by mapping to the hg19 human genome. We excluded poor- 
quality cells after the gene-cell data matrix was generated by Cell Ranger 
software using the Seurat package (v2.3.4). Only cells that expressed 
more than 800 genes and fewer than 7,000 genes were considered, and 
only genes expressed in at least 30 single cells (0.1% of the raw data) 
were included for further analysis. Cells that expressed haemoglobin 
genes (HBM, HBA1, HBA2, HBB, HBD, HBE1, HBG1, HBG2, HBQ1 and HBZ) 
were also excluded. Cells with a mitochondrial gene percentage over 
15% were discarded. In total, 17,737 genes across 30,416 single cells 
remained for subsequent analysis. The data were normalized to a total 
of 1x 10* molecules per cell for the sequencing depth using the Seurat 
package. The batch effect was mitigated by using the ScaleData func- 
tion of Seurat (v2.3.4). 


Identification of cell types and subtypes by dimensional 
reduction and PAGA analysis 

The Seurat package (v2.3.4) was used to perform linear dimensional 
reduction. We selected 982 highly variable genes with average expres- 
sion between 0.0125 and 8 and dispersion greater than 2 as input for 
PCA. Then we identified significant PCs based on the JackStrawPlot 
function. Strong PC1-PC10 were used for t-SNE to cluster the cells by 
FindClusters function with resolution 1.2. Clusters were identified by 
the expression of known cell-type markers and GO analysis. The markers 
ASCL1, NEUROD2, GAD1, OLIG2, MBP, AQP4, SPARC and PTPRC were 
used to hippocampal cells as progenitor cells, excitatory neurons, 
inhibitory neurons, OPCs, oligodendrocytes, astrocytes, endothelial 
cells and microglia, respectively. 

Three-dimensional t-SNE was applied to cluster all cells inthe human 
developing hippocampus (dim.embed = 5) with PC1-PC10. Visualiza- 
tions were done using rgl package (v0.99.16) implemented in R. We 
then applied partition-based graph abstraction (PAGA) to predict a 
lineage tree for the hippocampal and the prefrontal cortical cells inan 
unbiased way. We produced aconsolidated lineage tree that included 
allidentified cell types rooted to a stem cell group. 


Identification of DEGs among clusters 

The DEGs of each cluster were identified using the FindAlIMarkers func- 
tion (thresh.use = 0.25, test.use = “wilcox”) with the Seurat R package 
(6). We used the Wilcoxon rank-sum test (default), and genes with aver- 
age expression difference >0.5 natural log with P< 0.05 were selected 
as marker genes. Enriched GO terms of marker genes were identified 
using DAVID 6.878” (https://david.ncifcrf.gov/home.jsp) and Metas- 
cape”? (http://metascape.org). 


Constructing single cell trajectories in the hippocampus 

The Monocle 2R package (version 2.6.4) and Monocle 3 alpha R package 
(version 2.99.2) were applied to construct single cell pseudo-time tra- 
jectories to discover developmental transitions» °. We used highly vari- 
able genes identified by Seurat to sort cells into pseudo-time order. The 
actual gestational time of each cell informs us which states of cells are at 
the beginning of pseudo-time in the first round of “orderCells”. We then 
call “orderCells” again, passing this state as the root_state argument. 
“DDRTree” and “UMAP’ were applied to reduce dimensional space and 
the minimum spanning tree on cells was plotted using the visualization 
functions “plot_complex_cell_trajectory” or “plot_3d_cell_trajectory” 
for Monocle 2 and Monocle 3 alpha, respectively. 


Cell-cycle analysis 

In the cell-cycle analysis, we applied a cell-cycle related gene set 
with 43 genes expressed during G1/S and 54 genes expressed during 
G2/M?*"5, We defined the G1/S and G2/M states of each cell by comparing 
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the average expression of the two gene sets using the CellCycleScoring 
function using Seurat R package. These gene sets should be anticor- 
related in their expression levels, and cells expressing neither are likely 
to bein the G1 phase (not cycling). 


WGCNA analysis in categorizing genes 

WGCNA analysis was performed by R package “WGCNA’”°”’ (R version 
3.4.3, https://cran.r-project.org/src/contrib/Archive/WGCNA; package 
version 1.6.6). The WGCNA soft power value was determined by navigat- 
ing the soft-threshold-mean-connectivity curve. Modules with <0.25 
similarity were merged. Modules correlated witha specific cell subtype 
were considered as standard modules for categorizing genes into cer- 
tain cell subtypes. Seven modules were selected for neuron subtypes. 


ATAC library preparation for high-throughput sequencing 
ATAC-seq was performedas described previously**”. In brief, a total of 
50,000 cells were washed twice with 50 pl of cold PBS and resuspended 
in 50 pl lysis buffer (10 mM Tris-HCI pH 7.4, 10 mM NaCl, 3 mM MgCl2, 
0.1% (v/v) Nonidet P40 Substitute). The suspension of nuclei was then 
centrifuged for 10 min at 500g at 4 °C, followed by the addition of 50 
pl transposition reaction mix (10 pl 5 x TTBL buffer, 4 pl TTE mix and 
36 pl nuclease-free H,O) from the TruePrep DNA Library Prep Kit V2 
for Illumina (Vazyme Biotech). Samples were then incubated at 37 °C 
for 30 min. DNA was isolated using a QlAquick PCR Purification Kit 
(QIAGEN). ATAC-seq libraries were first subjected to five cycles of pre- 
amplification. To determine the suitable number of cycles required for 
the second round of PCR, the library was assessed by quantitative PCR 
as described previously** and then PCR amplified for the appropriate 
number of cycles. Libraries were purified with a QIAquick PCR Purifica- 
tion Kit (QIAGEN). Library quality was checked using a High Sensitivity 
DNA Analysis Kit (Agilent). Finally, 2 x 150 paired-end sequencing was 
performed on an Illumina HiSeq X-10. 


ATAC-seq data analysis 

Insimpleterms, we removed adaptor sequences and then mapped reads 
tothehg19 reference genome with the parameters: -t-q-N1-L25-X 2000 
using Bowtie2 (version 2.3.4.3). All unmapped reads, non-uniquely 
mapped reads and PCR duplicates were removed. The uniquely mapped 
reads were shifted by +4 or —-5 bp according to the strand of the read. 
To visualize the ATAC-seq signal, we extended each read by 50 bp and 
counted the coverage for each base. All the ATAC-seq peaks were called 
by MACS2 v2.1.1 with the parameters -nolambda. 


ATAC-seq data quality control 

ATAC-seq data quality was evaluated for several parameters, including 
thenumber of rawreads, alignment rate, percentage of reads mapped to 
chromosome M, percentage of reads mapped to repeat regions (black 
list), percentage of reads that passed MAPQ score filter, percentage 
of total signal within known artefact regions and correlation between 
replications. 


Connecting transcription factors to target genes 

To find the potential transcription factors that bind the PROXI regula- 
tory sequence (TSS + 2k), FIMO from MEME Suite (version 5.0.4) was 
used for motif enrichment analysis. To investigate the genes that are 
regulates by PROX1, the PROX1 motif profile was downloaded fromthe 
Jaspar database (http://jaspar.genereg.net/), and we used FIMO from 
the MEME suite for enrichment analysis of our peaks. 


Immunofluorescent staining 

Tissue samples were fixed overnight in 4% paraformaldehyde, cryo- 
protected in 30% sucrose, and embedded in optimal cutting tempera- 
ture (Thermo Scientific). Thin 40-~m cryosections were collected on 
superfrost slides (VWR) using a Leica CM3050S cryostat. For immuno- 
histochemistry, heat-induced antigen retrieval was performed in 10 


mM sodium citrate buffer, pH 6. Primary antibodies: mouse anti-CD45 
(1:100, Abcam ab8216), goat anti-SOX2 (1:250, Santa Cruz sc-17320), rab- 
bit anti-PAX6 (1:500, BioLegend 901301), rabbit anti-NEUROD2 (1:500, 
Abcamab104430), mouse anti-NEUROD1 (1:100, Abcam ab60704), rab- 
bit anti-HOPX (1:1,000, Santa Cruz sc-30216), mouse anti-Ki67 (1:100, 
BD 550609), mouse anti-SATB2 (1:250, Abcam ab51502), mouse anti- 
MEIS2 (1:200, Santa cruz sc-81986), rabbit anti-PROX1 (1:500, Abcam 
ab199359), rabbit anti-OLIG2 (1:500, Millipore AB9610), human anti- 
MBP (1:1,000, Abcam ab209328), mouse anti-GFAP (1:200, CST 3670S) 
diluted in blocking buffer containing 10% donkey serum, 0.5% Triton- 
X100 and 0.2% gelatin diluted in PBS at pH 7.4. Binding was revealed 
using an appropriate Alexa Fluor 488, Alexa Fluor 594, or Alexa Fluor 
647 fluorophore-conjugated secondary antibody (Life Technologies). 
Cell nuclei were counterstained using DAPI (Life Technologies). Images 
were collected using an Olympus FV1000 confocal microscope. 


Insitu hybridization 

The in situ hybridization protocol has been described previously*°. 
In brief, probes complementary to target human mRNA used for RNA 
in situ hybridization were cloned from primary human fetal cortical 
cDNA samples and reverse-transcribed using PrimeScript Il lst Strand 
cDNA Synthesis Kit (Takara) with oligo dT primers. Total RNA was iso- 
lated from GW27 human hippocampus using SV Total RNA Isolation 
System (Promega). Specific genes were amplified using the following 
primers: SEMASA forward AGC TCG CTT GGCTTT AGT CTT A, reverse 
CAA AAT AGG CTT TGA CTC CCA C; PID1 forward TGG GAT CTC TAG 
TGG GGT GG, reverse TAA GGC TTC TTA GGT GCC GC; SULF2 forward 
GTT TGA CAT CAG GGT CCC GT, reverse CTT TAA TGG GGT TGG CGG 
CT; NRIP3 forward AGC TGT GGT TGA TGA CAA TGA G, reverse CTG 
TAA TGGATA ATG TCC CTGG; STX10 forward GGG GAA GGG ACT GAC 
ATGTC, reverse GGA GGG CTG GGG TCA GAG AG; CHMP4A forward 
GAT TGG GCA AGG CTGGTCCC, reverse TTG GGA GCT GGC CCT GCC 
GG; BEXS forward TCA ACA TGG AAA ATG TCC CC, reverse AGA CTG 
CTT TTA AAT TGC TT; NBPF1 forward GGG TGC ACC AAG AGC AGC 
CT, reverse CCT CAG CAT AAATTT TAT GA; CASCIS5 forward CAA GCA 
TGT AGC CCT GCCCG, reverse CTC TGT TTC TGT CAT CTC TC; primers 
specific to target genes of interest were designed using Primer3 and 
amplified by PCR using Q5 High-Fidelity DNA Polymerase (NEB). PCR 
products of predicted band size were gel extracted and ligated into the 
Hieff Clone Plus One Step Cloning Kit (Yeason). Ligation products were 
transfected into Trans5a Chemically Competent £. coli (Transgene). 
Cloned sequences were confirmed by sequencing. Digoxigenin-labelled 
RNA probes for in situ hybridization were generated by linearizing 
the pSPT18 Vector and in vitro transcribing the probe using T7 or SP6 
RNA Polymerase (Roche) in the presence of DIG-RNA Labelling Mix 
(Roche). Fetal brain sections of 30pm thickness were hybridized with 
RNA probes ata final concentration of 500 ng/ml overnight at 64.5 °C 
in hybridization solution (50% formamide, 10% dextran sulfate, 0.2% 
tRNA (Invitrogen), 1 x Denhardt’s solution (Sigma) and 1x salt solution 
(containing 0.2 M NaCl, 0.01M Tris, 5 mM NaH,PO,, 5 mM Na,HPO,, 
5mMEDTA pH7.5)) overnight. After the sections were washed, alkaline 
phosphatase-coupled anti-digoxigenin Fab fragments (Roche) were 
applied. For visualization of the labelled cRNAs, the sections were incu- 
bated in the dark in NBT/BCIP solution (Roche). Images were taken 
using a Leica SCN400 (Leica Microsystems). 


Plasmids and in utero electroporation 

NBPF1 genes were cloned into a pEGFP-C1 vector. Electroporation was 
performedas previously described”. In brief, timed pregnant CD-1 mice 
(E13.5) were deeply anaesthetized with isoflurane, and the uterine horns 
were exposed through a midline incision. 1 pl of plasmid DNA (1-2 pg/ 
pl) mixed with Fast Green (Sigma) was manually microinjected into 
the fetal brain lateral ventricle through the uterus, using a bevelled 
and calibrated glass micropipette (Drummond Scientific) followed 
by five 50-ms pulses of 50 mV with a1s interval delivered across the 


uterus with two 9-mm electrode paddles positioned on either side of 
the head (BTX, ECM830). 


Patch-qRT-PCR of NBPF1-GFP plasmid overexpressed cells 

Coronalslices containing cells overexpressing the NBPF1-GFP plasmid 
were prepared using a vibratome (VT12008S, Leica, Wetzlar, Germany) in 
oxygenated (95% O, and 5% CO,) ice-cold sucrose-based artificial cerebro- 
spinal fluid (s-ACSF, 234 mM sucrose, 2.5 mM KCI, 26 mM NaHCO,,1.25mM 
NaH,PO,, 11 mM D-glucose, 0.5 mM CaCl, and 10 mM MgSO,). The 
slices were kept in an incubating chamber filled with oxygenated ACSF 
(126 mM NaCl, 3 mMKCI, 1.2mMNaH,PO,, 2.4 mM CaCl, 1.3mMMgso,, 
26 mM NaHCO,, 10 mM D-glucose) at 34 °C for 30 min. After a recovery 
period of at least 60 min at room temperature, an individual slice was 
transferred to a recording chamber and was continuously superfused 
with oxygenated ACSF (4 ml/min) at room temperature. We captured 
whole cells overexpressing the NBPF1-GFP plasmid and distributed 
each into a single tube, and then we used SMART-seq2 to amplify the 
mRNA intoacDNA library. Then, we used qRT-PCR to detect NBPF1 and 
LHX2gene expression. Specific genes were amplified using the following 
primers: GAPDH forward GTC AAG CTC ATT TCC TGG TAT GAC, reverse 
TAT GGG GGT CTG GGA TGG AA; NBPF1 forward GCG AGG CTG CCC GAG 
CTT CT, reverse GAC TTC GCG TAA CTT CCC ATT CA;LHX2 forward GAA 
CGA TGC TGA ACA CCT GG, reverse AAC CAG ACC TGG AGG AC TCT C. 


Statistical analysis 

Comparisons between two groups were made using f-tests. The quan- 
tification graphs were analysed by using GraphPad Prism (GraphPad 
Software). Sample size and P values are given in the Figure legends. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Extended Data Fig. 1| Single-cell RNA-seq information and molecular 
diversity of single cells. a, Scheme of bioinformatic analysis. b, Expression of 
known markers shown using the same layout as in Fig. 1b. Grey, no expression; 
red, relative expression. c, Heat map showing the expression level and identity 
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progenitors, 2,486; DG EXN, 2,516; CGE-derived InN, 5,375. d, ¢-SNE plots of cells 
inthe hippocampus. Two repetitions of GW22 are labelled in different shapes, 
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batches from the same embryo stages. Each cell colour represents the 
gestational week. Sample size: GW16, 4,411 cells; GW18, 4,035 cells; GW20, 
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Extended Data Fig. 2| Molecular diversity of subgroups of cells. 

a-c, Heat maps show the subclasses of inhibitory neurons (a), astrocytes (b) 
and oligodendrocytes (c). The genes are organized into clusters. The bar chart 
onthe top shows the gestational week. Specific genes related to each subtype 
are highlighted onthe right with enriched GO terms. Interneurons: 3,189, 
2,334, 909, 670, 1,765, 1,073, 1,019, 793 cells; astrocytes: 275, 95, 141, 134,58 


cells; oligodendrocytes: 103, 227, 131, 282, 257, 459 cells.d, The enriched GO 
terms show the cell properties of the hippocampus in different cell types. 
Progenitors, 2,486 cells; excitatory neurons, 10,715 cells. e, Immunostaining 
for oligodendrocyte markers at GW16 showing the position ands morphology 
of oligodendrocytes in human prefrontal cortex. Scale bar, 500 pm. The 
experiment was repeated three times independently with similar results. 
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Extended Data Fig. 3 | Molecular diversity of progenitors inthe 
hippocampus. a, Heat map showing the expression levels and identities of 
genes in the progenitor subclasses. Known gene expression in each type and 
GO enrichments are shown tothe right. The graph above shows the distribution 
of each subclass by gestational week, and the graph below shows the 
subclusters of progenitors. Clusters 1-8: 204, 159, 483, 730, 397, 300, 139 and 74 
cells. b, Dot plot for known markers of subtypes of progenitors in Fig. 2b. The 
size of each dot represents the percentage of cells in each cluster. Grey-to-blue 
gradient shows low-to-high gene expression. Progenitors: 204, 159, 483, 730, 
397, 300, 139 and 74 cells. c, Dot plot for novel markers of subtypes of 
progenitors. The size of each dot represents the percentage of cells ineach 
cluster. Grey-to-red gradient shows low-to-high gene expression. Progenitors: 


204, 159, 483, 730, 397, 300, 139 and 74 cells. d, Abstracted graph shows the 
connection on the transcriptome of different subtypes in the developing 
human hippocampus. Each dot represents a single cell, and cell colour 
represents the cell type. e, Abstracted graph shows the connection onthe 
transcriptome of all subtypes in the developing human hippocampus. Each dot 
represents a single cell, and cell colour represents the cell type. f, Abstracted 
graph shows the connection on the transcriptome of different weeks inthe 
developing human hippocampus. Each dot represents a single cell, and cell 
colour represents the week. g, h, Visualization of eight subtypes of progenitors 
inthe developing human hippocampus using ¢-SNE (g), and expression of 
known markers using the same layout (h). Grey, no expression; red, relative 
expression. 
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Extended Data Fig. 4 | Immunostaining of progenitors in the developing 
hippocampus. a, Immunofluorescence images of PAX6 and SOX2 at GW11. 
Scale bar, 2,000 pm. b, Immunofluorescence images of HOPX and SOX2 at 
GWIL1. Scale bar, 2,000 pm. c, Immunofluorescence images of PROX1, PAX6, 


GW16 
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similar results. 


HOPX and SOX2 at GW14. Scale bar, 1,000 pm. d-i, Immunofluorescence 
images of PROX1, PAX6, HOPX and SOX2 in GW16 (d-f) and GW22 (g-i). Scale 
bar, 500 pm. The experiment was repeated three times independently with 
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Extended Data Fig. 5 | Immunostaining of developing hippocampus. 

a, b, Immunofluorescence images of PAX6, HOPX and MKI67 at GW25. Scale 
bar, 500 um.c,d, Cell cycle analysis of PAX6* (c) or HOPX’ (d) progenitors. 

e, Immunofluorescence images of PROX1in GW25 to show granule cell layer. 


Scale bars, 500 pm (left); 100 um (right, panels 1-3). f-i, Immunofluorescence 
images of PAX6, HOPX, NEUROD1and GFAP at GW2S. Scale bar, 500 pm. The 
experiment was repeated three times independently with similar results. 

j,k, The maturation scores of PAX6* (j) and HOPX’ (k) progenitors. 
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expression. c, Cluster dendrogram showing the modules selected to calculate 
the gene network in Fig. 3h. d, The cluster trees and heat map show the 
correlation of different gene modules in excitatory neurons. 


Extended Data Fig. 6 | Molecular diversity of excitatory neurons inthe 
hippocampus. a, b, Expression of known markers (a) and new markers (b) 
shown using the same layout asin Fig. 3a. Grey, no expression; red, relative 
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Extended Data Fig. 7 | Molecular diversity of inhibitory neurons inthe known markers shown using the same layout as in Fig. 3k. d, Expression of novel 
hippocampus. a, GO analysis of modules created by clustering the two main markers of MGE-derived inhibitory neurons shown using the same layout asin 
branches from the lineage tree. The analysis reflects cell fate commitment. In Fig. 3k. Grey, no expression; red, relative expression. e, Expression of novel 


this heat map, the middle represents the start of pseudo-time. From this point, markers of CGE-derived inhibitory neurons shown using the same layout asin 
one lineage moves to the CA and the other moves to the DG. Rows are GO terms Fig. 3k. 
correlated into different modules. Sample size: 12,115 cells. b,c, Expression of 


Article 


@ Gw20 b 
@ Gw27 


a @ Gwi6 


@Gwi8 
@ Gw25 


e PTPRC/MKI67/DAPI 


@1 @2 @3 @4 @5 @6 @7 @8 9 010011 


c 01 02 03 04 05 06 


MKI67 sal @8 @9 ©10011 


d @ GwWié ® GW18 @ Gw20 
TOP2A @ GW22 @ GW25 @ GW27 
I 
«.. 0 


GW25 


Extended Data Fig. 8 | Molecular diversity of microgliain the human 
hippocampus. a, Heat map showing the expression levels and identities of 
genes in the microglia subclasses. The graph above shows the distribution of 
each subclass by gestational week. b, Visualization of ten subtypes of microglia 
inthe developing human hippocampus using ¢-SNE. Each dot represents a 
single cell, and cells are laid out to show similarities. Each cell colour represents 
the cell type. Expression of known markers is shown using the same layout on 


the right; grey, no expression; red, relative expression. Microglia: 638, 489, 
246, 259, 465, 229, 84, 84, 68,54 and 44 cells.c, d, Distribution of G1, S, and 
G2/M stages of the cell cycle for microglia of different subtypes (c) and at 
different gestational weeks (d). e, Immunostaining images of PTPRC and MKI67 
at GW25. Scale bars, 500 pm (left), 100 pm (right). The experiment was 
repeated three times independently with similar results. 
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Extended Data Fig. 9 | NBPF family genes in the human hippocampus. 

a, Expression of NBPF family genes shown using the same layout as in Fig. 1b. 
Grey, no expression; blue, relative expression. b, Domains of NBPF1. 

c, Evolutionary history inferred using the neighbour-joining method. The tree 
is drawn toscale, with branch lengths inthe same units as those of the 
evolutionary distances used to infer the phylogenetic tree. The evolutionary 
distances were computed using the Poisson correction method andare inthe 
units of the number of amino acid substitutions per site. The analysis involved 
six amino acid sequences. Evolutionary analyses were conducted in MEGA X. 
d, Overexpression of NBPF1 promotes DG formation at E13.5, and is observed at 
E15.5 in mouse. Scale bars, 500 pm. The experiment was repeated six times 


SS) | 


Human NBPF1 


ae 
1S 


O 
ar awe ADP NTE xP 
ot ot DE WE UE UE QUE uk 
r 20 
0.0 
9 £13,5-£15.5 IUE 
1. Patch-clamping 2. Capture GFP 3. Positive pressure 
and imaging -positive cell to transfer cell 
= => 
4. mRNA>cDNA 
—+qRT-PCR 
NBPF1 LHX2 


80: 


L Q 
3G fo) 


N 
fo} 


Relative Expression(%) 
(normalize to GAPDH) 


GFP 


NBPF1-GFP GFP NBPF1-GFP 


independently with similar results. e, Scheme depicting the position inthe 
mouse brain at E18.5 of the slice inf. f, Overexpression of NBPF1 promotes DG 
formation at E13.5, and this is observed at E18.5 in mouse. Scale bars, 1,000 pm. 
The experiment was repeated six times independently with similar results. 

g, Flowchart of patch-qRT-PCR.h, Relative expression of specific genes of 
GFP* cells. **P= 0.0020, *P=0.0408, two-sided t-test; n=10 GFP cells; 8 NBPF1- 
GFP cells. Mean+s.e.m. i, Normalized ATAC-seq profiles of PROX] in GW25 
hippocampus with three independent biological replicates (Rep1, Rep2 and 
Rep3) showing the activation of PROX1. The amplifying panel shows the 
predicted LHX2 binding sites. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


Oo A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


O For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


[| Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection No software was used for data collection. 


Data analysis Custom written scripts in Cellranger v2.0.1, Python 3.7, Python package: Scanpy v1.3.8, R, R package: Monocle v2.99.0, Seurat v2.3.4, rgl 
v0.99.16, David 6.8, GraphPad Prism 6.0. See Methods for details on how each software is used. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


The scRNA-seq data and ATAC-seq data used in this study have been deposited in the Gene Expression Omnibus (GEO) under accession numbers GSE131258. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size The sample size of scCRNA-seq and ATAC-seq were determined by availability of human tissues. We collected 7 hippocampus from embryonic 
stages for scRNA-seq with 2 samples at the same developmental stage as replications . Final dataset scale was determined according to the 
quality control criteria as described in the methods. 6 mice were used per time point to quantify the phenotype. 


Data exclusions Cells detected with less than 800 genes were removed as low quality cells. Genes which only expressed in fewer than 30 cells (0.1% of total 
cell number) were excluded as recommended by Seurat (Ver.2.3.4 ). 


Replication As scRNA-seq, 2 biological replicates in GW22 and no replicates for other time points. 3 replicates were used in ATAT-seq experiments. 6 mice 
were used per time point to quantify the phenotype as described in figure legends. All replications were consistent for data results. 
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Randomization The samples were allocated into each experimental groups based on the gestational stage. See methods 'Tissue sample collection and 
dissection’. 


Blinding The investigators were blinded to group allocation during data collection and analysis. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
[| Eukaryotic cell lines |_| Flow cytometry 
[| Palaeontology |_| MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used For immunostaining, the following antibodies to the following proteins were used: 
ouse monoclonal [MEM-28] to CD45 ,Abcam,ab8216, GR302332-1; 

Rabbit polyclonal to NEUROD2,Abcam,ab104430,GR94291-4; 

Rabbit Polyclonal to PAX6,BioLegend,901301,B201255; 

Goat polyclonal to SOX2,Santa Cruz,sc-17320,H1406; 

ouse monoclonal to NEUROD1, Abcam,ab60704,GR3183945-2; 

Rabbit monoclonal [EPR18114] to HMGA2, Abcam, ab207301; 

Rabbit polyclonal to HOPX, Santa Cruz, sc-30216, D1615; 

ouse monoclonal [B56] to Ki67, BD Biosciences, 550609, 19679; 

Rabbit monoclonal [EPR19273] to PROX1, Abcam, ab199359, GR45436-1; 

Rabbit polyclonal to OLIG2, Millipore, ab9610,3018858 ; 

Human monoclonal [IGX3421] to Myelin Basic Protein, Abcam, ab209328,GR278417-4; 
ouse monoclonal [GA5] to GFAP,CST,3670S,6. 


Validation All antibodies were validated by the supplier for human or mouse samples and by comparing to the manufacturer's or in-house 
results. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals CD1 mouse, female, timed pregnant at E13.5. 
Wild animals This study did not involved the wild animals. 


Field-collected samples This study did not involved the samples collected from the field. 


Ethics oversight Mouse housing and experimental protocols in this study were in compliance with the guidelines of the Institutional Animal Care 
and Use Committee of the Institute of Biophysics, CAS. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics All embryo and fetal tissued were between 16-27 gestational weeks. Gestational age was measured in weeks from the first day 
of the woman's last menstrual cycle to the sample collecting date. 


Recruitment Beijing Anzhen Hospital was in charge of recruiting donors for this research. The patients decided to have an abortion first, and 
then they were asked whether they would agree to donate the fetal tissues to this study. The de-identified human fetal tissue 
samples were collected after the donor patients signing informed consent document. The tissue collection and research 
protocols were approved by the Reproductive Study Ethics Committee of Beijing Anzhen Hospital and the institutional review 
board (ethics committee) of the Institute of Biophysics. 
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Ethics oversight The Reproductive Study Ethics Committee of Beijing Anzhen Hospital ; the institutional review board (ethics committee) of the 
Institute of Biophysics. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Our understanding of how human embryos develop before gastrulation, including 
spatial self-organization and cell type ontogeny, remains limited by available 
two-dimensional technological platforms'” that do not recapitulate the in vivo 
conditions®®. Here we report a three-dimensional (3D) blastocyst-culture system that 
enables human blastocyst development up to the primitive streak anlage stage. These 
3D embryos mimic developmental landmarks and 3D architectures in vivo, including 


the embryonic disc, amnion, basement membrane, primary and primate unique 
secondary yolk sac, formation of anterior—posterior polarity and primitive streak 
anlage. Using single-cell transcriptome profiling, we delineate ontology and 
regulatory networks that underlie the segregation of epiblast, primitive endoderm 
and trophoblast. Compared with epiblasts, the amniotic epithelium shows unique 
and characteristic phenotypes. After implantation, specific pathways and 
transcription factors trigger the differentiation of cytotrophoblasts, extravillous 
cytotrophoblasts and syncytiotrophoblasts. Epiblasts undergo a transition to 
pluripotency upon implantation, and the transcriptome of these cells is maintained 
until the generation of the primitive streak anlage. These developmental processes 
are driven by different pluripotency factors. Together, findings from our 3D-culture 
approach help to determine the molecular and morphogenetic developmental 
landscape that occurs during human embryogenesis. 


Technical limitations preclude the precise delineation of early human 
embryogenesis, suchas architecture formation and cell-type specifica- 
tion. Recent in vitro implantation platforms using a two-dimensional 
(2D) culture approach have revealed some developmental landmarks 
of early human embryos in vivo’”. However, these 2D culture embryos 
are largely flattened, which creates an imperfect model of normal 
3D embryonic development in vivo and limits classification using 
equivalent Carnegie stages* >. Although pluripotent stem cells can 
model some phenotypes of the human amnion sac, amniogenesis or 
organizer® ®, we still cannot authentically mimic human embryonic 
development, especially for embryo lineage ontogeny. These existing 
methods exclude crosstalk among different cell types in embryos. 
Here, we report a 3D-culture system that enables development of the 
human blastocyst up to the primitive streak anlage (PSA) stage. Using 
the 3D platform, we reveal a developmental landscape of human pre- 
gastrulation embryos. 


3D architectures in 3D-cultured embryos 

We used donated human embryos from surplus embryos after clinical 
in vitro fertilization. We tested whether the culture media used for 
2D human embryo cultures—IVC1 and IVC2!?—were suitable for 3D 


culture of blastocysts. These media sustained only 6.3% of embryos 
until 14 days post-fertilization (d.p.f.) using morphological embryo 
observations (Extended Data Fig. laf). We then added sodium lactate, 
sodium pyruvate and ROCK inhibitor (Y27632)? to IVC1 and IVC2 media, 
resulting in modified IVC1(mIVC1) and mIVC2, respectively. Cultures in 
mIVCland mIVC2 sustained 23.4% of human blastocysts until 14 d.p.f. 
(Extended Data Fig. la-f). 

We then designed a series of 3D extracellular matrix with Matrigel 
embedding to identify an ideal 3D blastocyst-culture system. We veri- 
fied embryonic development using morphological observations and 
staining for specific lineage markers—OCT4 for the inner cell mass (ICM) 
and epiblast (EPI), GATA6 for primitive endoderm/hypoblast (PrE) and 
CK7 for trophoblast (TrB) (Extended Data Fig. 1g—j). We found that 10% 
Matrigel yielded the best outcome and enabled 23.5% of blastocysts to 
develop to 14 d.p.f. with normal embryonic structures (Extended Data 
Fig. lg-m). We used 10% Matrigel in conjunction with mIVC1 and mIVC2 
to culture blastocysts unless noted otherwise (Fig. 1a). 

Morphological observations revealed that we could culture human 
blastocysts up to 14 d.p.f. using our 3D platform (Fig. 1b). Nearly all 
blastocysts at 5-6 d.p.f. were positive for GATA6, but negative for CK7, 
whereas theICM expressed OCT4, NANOG, KLF17 and PRDMI14 (Fig. Ic, d, 
Extended Data Fig. 9b). At 7 or 8 d.p.f., GATA6, CK7 and OCT4 showed 
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Fig. 1| Human embryos self-organize in 3D architectures using our 

3D blastocyst-culture conditions. a, Schematic of our in vitro 3D-culture 
platform for human blastocysts. b, Bright-field images of human blastocysts 
developing up to 14 d.p.f. c—-e, Immunostaining of 5-6-d.p.f. (c,d, 3 out of 3 
embryos) and 7-d.p.f. (e, 2 out of 2embryos) blastocysts with markers as shown. 
Red arrowheads (insets) denote ICM; white arrowheads denote 
NANOG*‘PRDMI4‘ cells. f-h, Representative staining of an 8-d.p.f. (f, 3 out of 3 
embryos) and a10-d.p.f. (g, 4 out of 5embryos; h, 3 out of 3 embryos) embryo 


mutually exclusive expression (Fig. le, f), which indicates that EPI, 
PrE and TrB cells gained greater molecular and physical specificity. 
TrB cells displayed strong filamentous CK7 staining (Fig. 1f). EPI cells 
began to polarize and rearrange radially” (Fig. 1f). During development, 
the human yolk sac has two developmental phases: a primary yolk sac 
(PYS) develops between 7 and 9 d.p.f., followed by the formation of 
the secondary yolk sac (SYS) at 12-13 d.p.f. (ref. "). We did observe the 
appearance of asmall PYS surrounded by GATA6’ PrEs (Fig. 1f, Extended 
Data Fig. 2a). At 10 d.p.f., radial arrangements indicated polarity and 
epithelialization in EPIs by organized distributions of podocalyxin 
(PODXL)” (Fig. 1g, h). We detected a distinct PYS (Fig. 1g, Extended Data 
Fig. 2b). At 12 d.p.f., an amniotic cavity distinctly separated a group of 
thinner squamous amniotic epithelium (AME) from more columnar EPIs 
(Fig. li). By contrast, the amniotic cavity and yolk sac in 2D-cultured 
embryos appeared to collapse at this stage”. An obvious AME-EPI sepa- 
ration was not observed in 2D-cultured embryos!°, 

At 14 d.p.f., embryonic diameter and thickness measured over 
500 um and 400 pm, respectively (Extended Data Fig. 1k). The bilaminar 
disc maintained continuous growth with an obvious SYS (Fig. 1j). Asa 
distinctive feature of anthropoid primates, the SYS is composed of 
irregular visceral endoderm and squamous parietal endoderm based 
onnuclear shape” (Fig. 1j, k), which shows considerable concordance 
with anatomical descriptions from in vivo monkey and human 13-14- 
d.p.f. embryos"*. Squamous parietal endoderm expressed GATA6 
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(Extended Data Fig. 2). The white and red arrows inf denote epiblast and 
primary yolk sac, respectively. The white arrowhead in g denotes the amniotic 
cavity. i, Representative staining of a12-d.p.f. embryo (6 out of 8embryos). Red 
arrowheads denote EPIs; white arrows denote AME.j, Representative staining 
of 14-d.p.f. embryos (4 out of 6 embryos). k, Magnification of the square shown 
inj, rotated 90° clockwise. AC, amniotic cavity; PE, parietal endoderm (white 
arrows) (k2); VE, visceral endoderm (red arrows) (k1). Scale bars, 100 pm (b), 
50 um (c-g, i,j) or 10 um (h). See also Extended Data Figs. 1, 2. 


(Fig. 1k), which suggests that it arises from PrEs that rapidly give rise 
tothe visceral endoderm and parietal endoderm after implantation”. 
Embryos displayed a 3D spherical structure with a disc-shaped bilami- 
nar structure, SYS and amnion, and with the TrBs on the amnion side 
of the embryo initiating trophoblast differentiation and undergoing 
spatial and asymmetric development to concentrate at the amnion side 
(Extended Data Fig. In-q, Supplementary Videos 1, 2). Together, our 
3D platform can support self-organization with spatial architectures 
of human embryos. 


Delineating lineage by transcriptome 

We performed 557 single-cell RNA sequencing (ScRNA-seq) from 42 
embryos at 7 developmental stages to scan the transcriptome of our 
3D cultured embryos (Fig. 2a). Following quality control and stringent 
filtering, we used 555 single cells with 23,270 genes for subsequent 
analyses (Extended Data Fig. 3a—c). t-distributed stochastic neighbour 
embedding (t-SNE) analyses revealed seven clusters, identified as ICM, 
EPI, PrE, TrB (including cytotrophoblasts (CTBs), syncytiotrophoblasts 
(STBs) and extravillous cytotrophoblasts (EVTs)) and PSA-EPI based on 
lineage-specific marker expression and developmental time (Fig. 2b-d). 
Continuous transcriptome shifts from 6 to 14 d.p.f. reflected a tran- 
sition from pre- to post-implantation (Fig. 2d). Integrated analysis 
of scRNA-seq data from different embryo sources "8 revealed that 
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Fig. 2| Lineage delineation by transcriptome using scRNA-seq. a, Schematic 
of single-cell collection and transcriptome analyses. b-d, t-SNE analyses 
revealed seven clusters, identified as the ICM, EPI, PrE, TrB (including CTBs, 


lineage segregation occurred—but incompletely—in human 6-d.p.f. 
blastocysts independent of sample sources across all samples, and cell 
fates appeared fixated in 7-9-d.p.f. embryos (Extended Data Fig. 3d, 
e). Thus, we used scRNA-seq in 7-9-d.p.f. 3D embryos to examine the 
regulators that segregate TrBs, PrEs and EPIs. We observed EPI-, PrE- and 
TrB-specific genes associated with different signalling pathways and 
transcription factors (Extended Data Fig. 3f-j, Supplementary Table 1). 
Lineage-specific gene comparison between previous results’ and our 
results showed that core lineage transcription factors are maintained 
across different samples, while some differences in gene expression 
may be contributed by different developmental stages of embryos 
(Extended Data Fig. 3k, Supplementary Table 1.6). 


AME-EPI separation 


In contrast to mouse EPI, the proximal EPI in primates segregates an 
additional AME lineage before gastrulation”. However, human AME 
remains unclear in the absence of good molecular markers. As tissue 
morphogenesis requires cell-cell adhesion proteins’, we observeda 
symmetrical distribution of E-cadherin (also knownas CDH1) onthe cell 
membrane of columnar EPIs, while squamous AME displayed no to weak 
E-cadherin expression (Extended Data Fig. 4a, b). However, widespread 
E-cadherin distributions in human EPIs were concentrated on the apical 
site of wedge-shaped mouse EPIs”. Amnion formation requires signal- 
ling from the basement membrane generated by visceral endoderm”. 
The layer between PrEs and EPIs formed a laminin-containing basement 
membrane, enveloping EPIs but not AME (Extended Data Fig. 4c, d). 
The AME weakly expressed OCT4, NANOG and SOX2 (Extended Data 
Fig. 4b, f). At 6 and 8 d.p.f., E-cadherin was ubiquitously distributed 
onEPland TrB cell membranes, whereas laminin enveloped the entire 
EPI cluster and was widely expressed in TrBs (Extended Data Fig. 4g-)). 


STBs and EVTs) and PSA-EPI based on classical lineage-specific marker 
expression (c) and developmental time (d). FPKM, fragments per kilobase of 
transcript per million mapped reads. See also Extended Data Fig. 3. 


At 10 d.p.f., laminin concentrated around EPIs to form the basement 
membrane but was lost in the AME (Extended Data Fig. 4k, 1). 

We next checked expressions of ERIZN and WGA, which localize to 
the apical surfaces of human-pluripotent-stem-cell-derived amnion®*. 
ERIZN equally contributed to the apical surfaces of EPIs and AME 
(Extended Data Fig. 4m). However, WGA expressed in extra-embryonic 
cells, but not in EPls and AME (Extended Data Fig. 4n), which indicates 
differences between 3D-embryo- and human-pluripotent-stem-cell- 
derived amnion. Given that obvious separation of AME-EPI occurred at 
12 and 14 d.p.f., we analysed scRNA-seq from 12- and 14-d.p.f. EPIs in the 
post- and PSA-EPI clusters (Fig. 2b). The t-SNE analysis classified these EPIs 
intothree clusters, termed AME, intermediate state cells and EPIs, onthe 
basis of their gene-expression profiles (Extended Data Fig. 40, p). Com- 
pared to EPIs, AME significantly downregulated pluripotency genes and 
upregulated genes expressed in the AME of 12-17-d.p.f. monkey embryos 
(TFAP2C, MSX2and BMP4)” or self-organized amnion from human pluri- 
potent stem cells (TFAP2A and GATA3)°* (Extended Data Fig. 4p). High 
expression of hormone genes in AME (Extended Data Fig. 4q-u, Supple- 
mentary Table 2) corresponds to the AME of human placentas producing 
human chorionic gonadotropin (hCG)™. These results indicate the AME 
is a distinct population with specific phenotypes compared to EPIs. 


Forming anterior—posterior polarity and PSA 

Primitive streak remains poorly defined and remains an enigmatic struc- 
ture inhuman embryos. One hallmark of primitive streak formation 
is the epithelial-mesenchymal transition and upregulated N-cadherin 
(CDH2), amesenchymal marker”. We found N-cadherin localized in 
PrEs before 12 d.p.f. and was activated in some OCT4-expressing cells 
outside the EPI or near the AME-EPI junction at 14 d.p.f. (Fig. 3a, b). The 
result was confirmed by scRNA-seq data (Fig. 2c, Extended Data Fig. 51). 
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Fig.3| Human embryos display anterior-posterior polarity and formation 
of PSA. a,b, Staining of embryo sections at 12 d.p.f. (a,3 out of 3 embryos) and 
14.d.p.f. (b, 4 out of 5 embryos) with N-cadherin. Red arrows denote N-cadherin* 
PrEs; white arrows denote N-cadherin* EPIs migrating out from the disc (b). 

c-f, Staining of embryo sections at 14 d.p.f with T, LEFTY1, CERI and HESX1. 
Inc, red arrows denote T' cells (5 out of 8 embryos). Ind, white arrows denote 
LEFTY1‘ and red arrows denote T' cells (3 out of 4 embryos). Ine, white arrows 
denote CERI cells; red arrows denote T cells (e, 2 out of 2embryos). Inf, white 
arrows denote T HESX1'‘ EPIs; red arrows denote T’ HESX1 EPIs (2 out of 2 
embryos). g,h, Staining of humanembryosections at 14 d.p.f. Ing, white arrows 
denote T' cells; red arrows denote N-cadherin’ cells (4 out of 6embryos).Inh, 
white arrows denote T'OCT4*GATAG‘ cells; asterisks denote T'OCT4*GATA6- 
cells (3 out of 4 embryos). Scale bars, 50 tm (a-e, g) or 25 pm (f, hh). See also 
Extended Data Figs. 5-7. 


During embryonic development in mice, the anterior visceral 
endoderm secretes LEFTY1 and CERI to antagonize posteriorizing 
morphogens, and OTX2 regulates the anterior visceral endoderm and 
anterior—posterior axis””. We observed some EPIs expressing T-box 
transcription factor T (T, also known as brachyury or TBXT), an early 
marker for primitive streak”, but repressing OCT4 at the EPIs near the 
AME compartment boundary at 14 d.p.f. but not at 12 d.p.f. (Fig. 3c, 
Extended Data Fig. 5a). Although we did not observe mutually exclusive 
expression of SOX2 (anterior commitment) and NANOG (posterior 
commitment)” in EPIs, we observed CERI, LEFTY1 and OTX2 expression 
on one side of the 14-d.p.f. embryonic disc (Fig. 3d, e, Extended Data 
Fig. 5b-e), which suggests formation of the anterior visceral endoderm. 
In contrast to expression in mouse embryonic cells but not the extra- 
embryonic compartment”, OTX2 was expressed in the PrEs of our 3D 
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embryos (Extended Data Figs. 3f, Se). The location of T-expressing 
EPIs was opposite to the anterior visceral endoderm side (Fig. 3d, e). 

Next, we examined the expression of HESXI1, as a marker of EPI’’, 
early anterior differentiation or visceral endoderm”. We observed that 
HESX1 was uniformly expressed in 12-d.p.f. EPls (Extended Data Fig. 5f). 
Some EPIs lost HESX1 but upregulated T expression at 14 d.p.f. (Fig. 3f). 
Furthermore, T HESX1’ and T’HESXI EPIs localized on opposite sides of 
the EPIs, representing anterior and posterior EPIs. scRNA-seq revealed 
decreased expression of HESX1 inthe PSA-EPIs (14 d.p.f.) (Extended Data 
Fig. 5g). EPls from 14-d.p.f. embryos branched to produce 2 populations: 
THESXI' and T‘HESXI cells (Extended Data Fig. 5h), which further 
defines anterior and posterior cell fate. T'HESX1 EPIs enriched with 
posterior and primitive streak genes, whereas T HESX1* EPIs showed 
upregulated early anterior genes (Extended Data Fig. 5i, Supplemen- 
tary Table 3). 

Using serial sections of whole embryos, we observed that some 
14-d.p.f. embryos formeda cell dissemination region”, in whichsome 
T' EPls focally migrated from the embryonic disc, disrupted barriers 
formed by PrE (with N-cadherin expression), invaded the space near 
the visceral endoderm and co-expressed GATAG6 (Fig. 3g, h, Extended 
Data Figs. 6, 7). FLK1 (also known as KDR), amarker of extra-embryonic 
mesoderm”, was highly expressed in the EPIs, but lost in T* cells migrat- 
ing from the embryonic disc (Extended Data Fig. 5j), which suggests 
that the latter is not extra-embryonic mesoderm. Some FLK1‘ cells local- 
ized between PrEs and TrBs, indicating extra-embryonic mesenchyme 
(Extended Data Fig. 5k). Compared with post-implantation EPIs, PSA- 
EPIs significantly upregulated primitive streak genes (Extended Data 
Fig. 51). However, the absence of neural gene expression (Extended Data 
Fig. 5m-o) suggests that the 14-d.p.f. embryos have not developed to 
generate the initial nervous system yet. We conclude that our 14-d.p.f. 
embryos were at the PSA stage, which meets the internationally recog- 
nized ethical limit for human embryo culture. Together, we conclude 
that 1 population of 14-d.p.f. EPls underwent changes and has initiated 
anterior—posterior polarity and primitive streak formation. 

Continuous cell proliferation is key to evaluate embryonic develop- 
mental status. Cell proliferation in 2D-cultured embryos occurs only 
within 8-10 d.p.f., but not after 10 d.p.f.2, which implies that 2D embryos 
donot survive much beyond 14 days. Our 3D embryos maintained con- 
tinuous proliferation of EPls, TrBs and PrEs at all stages (Extended Data 
Fig. Sp-r). We predict that continuous cell proliferation may facilitate 
human embryos to develop beyond 14 d.p.f. to initiate gastrulation. 


Development of the trophoblast lineage 


The human placenta consists of three major TrB subpopulations: CTBs, 
EVTs and STBs. In our 3D-cultured embryos, the TrBs surrounding EPIs 
and PrEs presented a polarized localization of F-actin (Extended Data 
Fig. 8a). CK7* cells near the EPI-PrE bilayer had a single nucleus that 
expressed TEAD4 and E-cadherin (Extended Data Fig. 8a—c), which 
indicate a CTB identity. Multinucleated cells in the layer adjacent to 
AME expressed hCG (Extended Data Fig. 8d, e), which suggests STB 
identity. One outer-layer population highly expressed HLA-G, amarker 
of EVTs, in12-d.p.f. embryos, which significantly increased in 14-d.p.f. 
embryos (Extended Data Fig. 8f, g). Mutually exclusive expression of 
hCG, TEAD4 and HLA-G in most cells showed molecular and physical 
delineation of three TrB types at 12-14 d.p.f. (Extended Data Fig. 8h, i). 

We identified the top 2,603 most variable genes across the ICM and 
all TrBs using scCRNA-seq data. On the basis of developmental time and 
markers, t-SNE analysis categorized them into six populations: pre- 
CTBs (TEAD4*HLA-G ), post-CTBs, early STBs (CGB*CSHI HLA-G"), 
STBs, early EVTs (HLG-A*CSH1‘MMP2*ERBB2*) and EVTs™*? (Extended 
Data Fig. 8j-l). Continuous transcriptome shifts from 6 to 14 d.p.f. 
revealed a stepwise developmental progression, in which CTBs produce 
EVTs and STBs, and segregation initiated at 9 d.p.f. and completed at 
12 d.p.f. (Extended Data Fig. 81, m). Notably, CDX2 expression quickly 
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decreased after TrB specification at 7 d.p.f. (Extended Data Fig. 8k), 
given the absence of CDX2 expression in human peri-implantation 
embryos and trophoblast stem cells®. 

Because the human placenta must secrete steroids or polypeptide 
hormones to maintain pregnancy, we analysed the expression of 
120 polypeptide hormone genes produced by TrBs* (Extended Data 
Fig. 8n). STBs, which primarily produce placental hormones, expressed 
31 hormone genes starting from 8 d.p.f. EVTs expressed 19 polypep- 
tide hormone genes with upregulated expression over culture, which 
indicates maturation. Expression of CGA, PGF and CGB family genes in 
EVTs suggested pre-gastrulation EVTs can secrete hCG, progesterone 
and oestrogen. However, these genes are significantly downregulated 
in 8-week EVTs, and completely disappear in 24-week EVTs”. Thus, the 
ability of EVTs to secrete hormones gradually decreases over placenta 
development. 

Next, we identified genes corresponding to the different TrB sub- 
types (Extended Data Fig. 80). CTB-, STB- and EVT-specific genes closely 
associated with their functions and specific signalling pathways accord- 
ingly to their characteristics® (Supplementary Table 4). EVT-specific 
genes helped to regulate the immune system and angiogenesis. These 
results corroborate the finding that EVTs in human first-trimester pla- 
centas are crucial for immunomodulatory and spiral artery remodelling 
of the early maternal-fetal interface**. We determined the top-ranked 
transcription factors that control TrB development. Transcription 
factors for CTBs, STBs and EVTs included well-documented TrB and 
pluripotency factors and new potential transcription factors, such as 
MYBL2, TCF7L1 and NR2F2 (Extended Data Fig. 80). 


Squamous parietal endoderm 


Visceral 
endoderm 


Epiblast development and transition 


We analysed EPI transcriptome dynamics across development, which 
revealed four main clusters: ICM, pre-implantation EPI (pre-EPI), post- 
EPI and PSA-EPI (Fig. 4a, b). When tracking naive and primed pluripotent 
gene expression, we found that embryos lost naive genes TFCP2L1, 
KLF17 and KLF4 and activated primed gene CD24 after implantation, 
while maintaining general pluripotent genes (Extended Data Fig. 9a-d). 
scRNA-seq data confirmed EPI pluripotent state transition (EPST)*»*° 
(Extended Data Fig. 9e, f). 

We speculated whether EPIs from different developmental stages 
show distinct pluripotency regulatory networks by performing Pluri- 
NetWork analysis using existing mouse databases”. Different com- 
binations of key pluripotency regulators dominated and coordinated 
EPST networks (Extended Data Fig. 9g-j). Naive pluripotency genes 
(ESRRB, KLF4 and TFCP2L1) only occupied ICM networks, which sug- 
gests that human EPIs quickly lose naive pluripotency after lineage 
diversification, consistent with monkey EPIs that only transiently 
express naive pluripotency before the late- and hatched-blastocyst 
stages*®. Genes specific for different developmental stage EPIs revealed 
that EPIs maintained a stable transcriptome from pre-implantation to 
post-implantation with marked changes in gene expression occurring 
during the pre-EPI transition and PSA initiation stage (Fig. 4c). Differ- 
ent EPI pluripotency states were dominated by different transcription 
factors and regulatory pathways (Fig. 4c, Supplementary Table 5). 
Pairwise comparisons showed similar data (Extended Data Fig. 9k-n, 
Supplementary Table 6). 
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To compare cynomolgus and human EPI development, we analysed 
scRNA-seq from monkey 6-17-d.p.f. EPIs®’. Principal component analy- 
sis (PCA) revealed that cynomolgus and human cells separated along the 
PCI, representing a major species difference (Extended Data Fig. 10a). 
We identified genes with significantly positive or negative PCl scores for 
cynomolgus and human genes (Extended Data Fig. 10b, Supplementary 
Table 7). Along the PC2 and PC3 axes, cynomolgus and human EPIs 
cells were plotted to reflect their similar developmental transitions 
and conserved signalling pathways and Gene Ontology (GO) terms 
(Extended Data Fig. 10a, c, Supplementary Table 7). To examine species 
differences, we compared the dynamics of naive genes and signalling 
pathways over EPST in human and reported monkey data”. In con- 
trast to human EPST, 7BX3 and SOX1S5 gradually decreased, and UTF1 
and NROB1 were absent during monkey EPST*® (Fig. 4c, Extended Data 
Fig. 10d, e). Pathway analysis for upregulated genes during the EPST 
revealed a similar enrichment trend for NOTCH, BMP and FGF signal- 
ling pathways in humans and monkeys” (Extended Data Fig. 10f-h). 
However, LEFTY1, LEFTY2 and NODAL in BMP signalling displayed dif- 
ferent patterns between human and monkey EPST (Extended Data 
Fig. 10f). Together, human and monkey EPIs had unique phenotypes 
and similarities during development. 


Discussion 


Here, we report a3D-culture system that successfully cultured human 
blastocyst growing to the PSA stage. These 3D embryos can recapitu- 
late almost all key 3D architectures and developmental landmarks of 
in vivo pre-gastrulation embryos. By contrast, many 3D structures 
and developmental landmarks—such as AME-EPI separation, base- 
ment membrane, SYS, anterior visceral endoderm, anterior—posterior 
polarity initiation and PSA—were not found using in vitro implantation 
platforms of 2D-cultured embryos!” and using human pluripotent 
stem cells that model early developmental embryonic events® ®. Our 
3D-cultured embryos recapitulated the timing and outcome of lineage 
segregation and development, which more authentically mimicked 
early human embryonic development in vivo (Fig. 4d). 

Because human embryogenesis is not well understood, we estab- 
lished a platform to delineate EPlIs, AMEs, PrEs and TrBs. We revealed 
the unique characteristics of AME as the first differentiated cell group 
emerging from an expanding EPI population®. We uncovered specific 
pathways and transcription factors for TrB subtype separation and 
functional differences between subtypes and between pre-gastrulation 
TrBs and fetal TrBs*”. Unlike mice, human EPI after implantation main- 
tains its transcriptional properties for a steady and prolonged period 
while acquiring properties for neuron differentiation and vasculature 
development. Overall, we reveal the molecular and morphogenetic 
developmental landscape of pre-gastrulation human embryos. These 
data provide crucial insights into the pluripotency of human pluripo- 
tent stem cells and uncover stem-cell self-renewal and differentiation 
processes, and will inform future strategies to improve in vitro ferti- 
lization success rates. 
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Methods 


Ethics statement 

This work was approved by the Medicine Ethics Committee of The First 
People’s Hospital of Yunnan Province (2017LS[K]NO.035). All donated 
embryos inthis study were surplus frozen embryos from couples who 
already had at least one healthy baby after in vitro fertilization clinic 
treatment. The informed consent process for embryo donation com- 
plied with International Society for Stem Cell Research (ISSCR) ‘Guide- 
lines for Stem Cell Research and Clinical Translation (2016) and ‘Ethical 
Guidelines for Human Embryonic Stem Cell Research (2003)’ jointly 
issued by the Ministry of Science and Technology and the Ministry of 
Health of People’s Republic of China. The Medicine Ethics Committee 
of The First People’s Hospital of Yunnan Province is composed of nine 
members, including lawyers, scientists and clinicians with relevant 
expertise. The Committee evaluated the scientific merit and ethical 
justification of this study and conducted a full review of the donations 
and use of these samples. All donor couples signed informed consents 
for voluntary donations of surplus embryos for human embryo devel- 
opment study at the Department of Reproductive Medicine in the First 
People’s Hospital of Yunnan Province. No financial inducements were 
offered for the donations. In the process, couples were informed that 
their embryos would be used to study the developmental mechanisms 
of humanembryos and that their donation would not affect their in vitro 
fertilization cycle. The culture of all embryos was terminated at day 14 
after fertilization or upon the appearance of primitive streak anlage. 


Embryo thawing and zona pellucida removal 

Before embryo thawing, human embryo culture medium G-2 (10132, 
Vitrolife) was equilibrated in a 4-well plate (176740, Nunc) overnight. 
First, 0.5 ml G-2 medium was added into the well and 0.25 ml mineral oil 
(10029, Vitrolife) was used to cover the G-2 medium. Human blastocysts 
(5-6 d.p.f.) were thawed using Kitazato Thawing Media Kit (VT102, 
Kitazato Corporation) by following manufacturer’s instructions. 
After culture in drops of the equilibrated G-2 medium for 4h, embryos 
were transferred to acidic Tyrode’s solution (T1788, Sigma-Aldrich) to 
remove the zona pellucida. Once the zona pellucida vanished, embryos 
were immediately transferred to G-2 medium. Embryos were washed in 
G-2 medium twice and then transferred to the in vitro culture medium. 


Evaluation of embryo quality 

According to the Gardner’s scoring system*°, thawed blastocysts were 
given numerical scores from 1to 6 based on their expansion degree and 
hatching status. The blastocyst with expansion and hatching status 
above 3 and with visible inner cell mass above grade B were includedin 
the study. On the basis of morphologies, healthy embryos had to meet 
the two following requirements: obvious expansion during culturingm 
and absence of obviously dead or broken (fragmented) cell mass during 
development. Otherwise, they were excluded from this study. 


Invitro 3D culture of human embryos 
Embryos without zona pellucida were cultured in a low attachment 
96-well plate (3474, Corning) with 1 embryo and 150 pl blastocyst-cul- 
ture medium in each well. The embryo culturing conditions were as 
follows: 37.2 °C, 6% CO, and saturated humidity. The culture protocol 
was summarized in Fig. la. First, at 6-8 d.p.f, the culture medium was 
modified in vitro culture medium 1 (mIVC1). At 8 d.p.f., 50% of mIVC1 
medium was replaced by mIVC2. At 9 d.p.f., embryos were transferred 
to newwells in mIVC2 including 10% Matrigel (354234, Corning). Then, 
50% culture medium was replaced by new mIVC2 including 10% Matrigel 
every day. mIVCland mlVC2 were pre-equilibrated in the incubator for 
at least 6h before use. A step-by-step protocol has also been included 
at Protocol Exchange”. 

The components of the mIVC1 and mlVC2 system are modifications 
of IVC1 and IVC2, established for culturing human attached embryos 


onthe dish!”. mIVC1: advanced DMEM/F12 (12634-010, Thermo Fischer 
Scientific) supplemented with 20% (v/v) DFBS (defined fetal bovine 
serum) (bs-0003, Biosera), 2 mM L-glutamine (25030, Thermo Fisher 
Scientific), 1x ITS-X (51500-056, Thermo Fisher Scientific), 8 nM 
B-oestradiol (E8875, Sigma-Aldrich), 200 ng mI“ progesterone (P0130, 
Sigma-Aldrich), 25 uM N-acetyl-L-cysteine (A7250, Sigma-Aldrich), 
0.22% (v/v) sodium lactate (L7900,Sigma-Aldrich), 1mM sodium pyru- 
vate (P4562, Sigma-Aldrich) and 10 pM Y27632 (S1049, Selleck). mIVC2: 
advanced DMEM/F12 supplemented with 30% (v/v) KOSR (knockout 
serum replacement) (A3181501, Thermo Fischer Scientific), 2 mM 
L-glutamine, 1x ITS-X, 8 nM B-oestradiol, 200 ng mI progesterone, 
25 uM N-acetyl-L-cysteine, 0.22% (v/v) sodium lactate, 1 mM sodium 
pyruvate and 10 pM Y27632. 


Optimizing the Matrigel concentration 

To identify optimal Matrigel concentration for culturing human blas- 
tocysts under 3D condition, we designed the following four group 
experiments: group 1, W/O Matr: human embryos were cultured onlow 
attachment plate without the Matrigel embedment upto 14 d.p.f.; group 
2,25% Matr: cultured human embryos were embedded in 25% Matrigel 
at 9 d.p.f.; group 3, Matr+10% Matr: human embryos were embedded 
in 10% Matrigel and transferred into the new well pre-coated by 100% 
Matrigel (30 ul) at 9 d.p.f.; group 4, 10% Matr: human embryos were 
embedded in 10% Matrigel at 9 d.p.f. Culture medium was mIVC1 on 
6-7 d.p.f., then 1:1 mIVC1:IVC2 on 8 d.p.f. and mIVC2 on 8-14 d.p.f. For 
Matr+10% Matr, 30 ul original liquid of Matrigel was added into one well 
of low attachment 96-well plate at 9 d.p.f. for 30 min in the incubator. 
Once the Matrigel solidified, 120 pl of mIVC2 with 10% Matrigel was 
added on the surface of the solidified Matrigel. A single embryo was 
transferred to anew well after equilibration for 6h. 


Frozen section staining and taking photographs 

Embryos were fixed by 4% paraformaldehyde, washed three times with 
PBS, dehydrated by 15% sucrose for 1 min and embedded in OCT. Embed- 
ded embryos were sectioned by a Leica frozen slicer at a thickness of 
10-12 um. Before staining, the slides were washed by PBS to clear OCT, 
and permeabilized with 0.2% Triton X-100 for 30 min at room tempera- 
ture. After blocking with 3% BSA in PBS for 4 h at room temperature, 
sections were incubated with primary antibodies at 4 °C overnight and 
then washed three times with 0.05% Tween-20. The following primary 
antibodies were used : mouse anti-OCT3/4 (Santa Cruz, SC5279,C-10, 
H1612 1:400), rabbit anti-Brachyury (T) (Santa Cruz, SC20109, poly- 
clonal, A0616, 1:50), rabbit anti-SOX2 (Millipore, AB5603, polyclonal, 
2826070, 1:400), goat anti-SOX17 (R&D Systems, AF1924, polyclonal, 
KGA0815042,1:250), rabbit anti-KLF4 (Millipore, 09-821, polyclonal, 
2383578, 1:400), rabbit anti-KLF17 (Atlas Antibodies, HPAO24629, 
polyclonal, C117502, 1:250), rabbit anti-B-catenin (Abcam, AB32572, 
E247, GR184212-37, 1:300), mouse anti-E-cadherin (Abcam, AB76055, 
M168, GR299147-4, 1:100), rabbit anti-cytokeratin 7 (CK7) (Abcam, 
AB181598, EPR17078, GR3214132-10, 1:300), rabbit anti-N-cadherin 
(Abcam, AB12221, polyclonal, 40975, 1:200), mouse anti-OTX2 (Santa 
Cruz, SC514195, D-8, GO816, 1:100), goat anti-LEFTY1 (R&D Systems, 
AF746, polyclonal, CMMO111101, 1:100), goat anti-SOX1 (R&D Systems, 
AF3369, polyclonal, XUV0618081, 1:500), rabbit anti-PAX6 (Bioleg- 
end, 901301, Poly 19013, b267205, 1:500), rabbit anti-laminin (Sigma- 
Aldrich, L9393, polyclonal, O28M4890V, 1:50), rabbit anti-TEAD4 (Atlas 
Antibodies, HPAO56896, polyclonal, R78063, 1:150), goat anti-FOXA2 
(Santa Cruz, SC6554, polyclonal, D1216, 1:100), goat anti-OCT3/4 (Santa 
Cruz, SC8628, polyclonal, G3201, 1:250), Mouse anti-hCG (Abcam, 
AB9582, 5H4-E2, GR308272-2, 1:100), rabbit anti-PRDM14 (Millipore, 
AB4350, polyclonal, 2897240, 1:50), goat anti-TFCP2L1 (R&D Systems, 
AF5726, polyclonal, CCUGO115021,1:200), goat anti-NANOG (R&D 
Systems, AF1997, polyclonal, KKJO514091, 1:250), mouse anti-PODXL 
(R&D Systems, MAB1658, 222328, JKW0218041, 1:400), mouse anti- 
HLA-G (Abcam, AB52455, 4H84, GR251679-19, 1:200), goat anti-GATA6 
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(R&D Systems, AF1700, polyclonal, KWT0419021, 1:200), mouse anti 
EZRIN (Sigma-Aldrich, E8897, 3C12, 117M4875V, 1:500), rabbit anti- 
CERI (Sigma-Aldrich, HPAO19917, polyclonal, R10176, 1:50), phalloi- 
din (F-actin) Alexa Fluor488 (Thermo Fisher Scientific, A12379, direct 
labelled, 1749905, 1:300), rabbit anti HESX 1 (Abcam, AB246949, poly- 
clonal, GR3267093-1, 1:100), goat anti-Brachyury (T) (R&D Systems, 
AF2085, polyclonal, KQP0617031, 1:200), goat anti FLK1 (R&D Systems, 
AF357, polyclonal, Cvisceral endoderm0617081, 1:100) and WGA Alexa 
Fluor 647 (Invitrogen, W32466, direct labelled, 1988457, 1:500). The 
secondary antibodies were incubated at room temperature for 2 h, 
and the slices were washed three times with 0.05% Tween-20. Pictures 
were taken by Leica SP8 laser confocal microscope. 


Whole-embryo staining and 3D reconstruction 

Fixed embryos were permeabilized by 0.5% Triton X-100 in PBS over- 
night in a 4 °C refrigerator. Embryos were blocked with 3% BSA in 
PBS for 4 hat room temperature and then transferred to a new well. 
Embryos were incubated with primary antibodies for 16-18 h at 4 °C. 
Embryos were washed three times in PBS including 0.05% Tween-20, 
and incubated the secondary antibodies for 4 hin room temperature. 
Embryos were washed three times in PBS including 0.05% Tween-20 
and transferred to a well of 8-well IbiTreat p1-plates (IB-80826, Ibidi 
GmbH) with an aqueous solution of 60% glycerol aqueous solution to 
take photographs. 

To make 3D videos, amultiphoton microscope Leica TCS SP8 Divis- 
ceral endoderm was used. Embryos were mounted in 80% glycerol aque- 
ous solution. Imaging was performed with a Leica TCS SP8 Divisceral 
endoderm multiphoton microscope witha HC Fluotar VISIR 25x/1.00 
NA CLARITY-optimized immersion objective with motorized correc- 
tion ring (Leica Microsystems). All z-stack images were acquired with 
at a 1,024 x 1,024 pixel resolution and with a z-step of 1 um. Three- 
dimensional data were deconvoluted with Lightning module of LAS X 
software (Leica Microsystems). Because of limited penetration ability 
of multiphoton microscope, we took 200-ym-thick pictures in Z-axis. 


Isolation of single cells 

Embryos were washed in PBS three times, washed in 0.25% trypsin 
(T4799; Sigma-Aldrich) twice, incubated with 0.25% trypsin for 15 min 
at 37 °C and terminated by DFBS. Embryos were dissociated into single 
cells by repeated pipetting and dispersed in 1% DFBS in PBS. A single 
cell was pipetted into a PCR tube. All above operations were performed 
using a Nikon SMZ645 microscopy. 


Cell number counts and embryo diameter measurements 

To count cell numbers, two protocols were used. First, the whole 
embryo was dissociated into single cells by digestion with 0.25% trypsin, 
and the total cell number per embryo was counted. The second was 
used to count numbers of specific cell types. After staining, cell num- 
bers of OCT4* EPIs, GATA6* PrE and CK7* TrBs were analysed by Image] 
software (v.1.51j8). To measure the diameter of developing human 
embryos, embryos were photographed every day and their diameters 
were measured by the Image] software. 


RNA sequencing of single cells 

Isolated single cells were washed in DPBS (Gibco 14190-144) and picked 
up using Pasteur pipettes under a dissecting microscope. The syn- 
thesis and amplification of full-length cDNAs were performed follow- 
ing Smart-seq2 protocol’. In brief, single cells were washed in DPBS 
(Gibco 14190-144) and picked into lysis buffer using Pasteur pipettes 
under a dissecting microscope. Reverse transcription reactions and 
pre-amplification were performed using SuperScript II (Invitrogen 
18064-014) and KAPA HiFi HotStart ReadyMix (KAPA Biosystems 
KK2601), respectively. The quality of the cDNAs was evaluated by Bio- 
analyzer 2100. Library construction and sequencing were performed 
by Annoroad Gene Technology (http://www.annoroad.com/) or BGI 


(https://www.bgi.com/). Sequencing was performed on an Illumina 
X-ten platform or a BGISEQ-500 sequencing platform (BGI). Pair-end 
reads were obtained, and the number of the reads was more than 7 mil- 
lion for every individual cell. 


Quality control, alignment of the scRNA-seq profiles and 
stringent filtering 
The sequencing qualities of 557 scRNA-seq profiles were examined with 
the FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/ 
fastqc/) and MULTIQC (v.1.6)*’. The annotation of RefSeq genes were 
downloaded from UCSC Genome Browser“. RefSeq exons were used 
to build databases of exon and splice sites with the extract_exon.py 
and extract_splice_sites.py in the hisat2 package*>. HISAT2 (v.2.1.0)* 
was used to align the scRNA-seq profiles to the human genome. The 
alignment results of HISAT2in the SAM format were converted to BAM 
format and sorted with SAMTools (v.1.1)**. Stringtie (v.1.3.4)” was used 
to calculate the abundances of genes (in FPKM) annotated in GENCODE 
(v.29)48? using the options of ‘-G gencode.v29.gtf -B -e -v’. Because the 
cells with limited number of expressed genes were potentially caused 
by RNA degradation, two scRNA-seq profiles with numbers of genes 
with abundance levels more than 1 FPKM were smaller than 2,000 were 
eliminated in further analysis. The basic information of the 555 remain- 
ing scRNA-seq profiles were available in Supplementary Table 8. 
Qualimap 2 (v.2.2.2-dev)*° was used to calculate the number of 
reads mapped to the genes in GENCODE genes with options of ‘—java- 
mem-size=40G comp-counts -bam -pe’. Then, we prepared a plot of the 
number of genes versus sequencing depth (the number of sequenc- 
ing reads in the scRNA-seq library) with the command of ‘qualimap 
counts -d -kS’. 


The t-SNE and trajectory analysis of the sCRNA-seq profiles 
Genes with dispersion values of at least 0.5 ina particular cell type were 
selected with the Seurat package (v.2.3.4) in R™. The top variable genes 
were used to classify cells with the FindClusters function in the Seurat 
package. Cell types were defined by expression of selected markers. 
Genes filtered with Seurat were used to perform t-SNE analysis for the 
celltypes under consideration using the RunTSNE function inthe Seurat 
package of R. Monocle (v.2.4.0)”°? was used to perform a trajectory 
analysis for the cell type under consideration. The heat map function 
of the R platform was used to generate the heat map of selected marker 
genes. Because AME did not express pluripotency genes (or expressed 
them only at low levels), we sorted cells from the ICM and EPI clusters by 
NANOG expression in analysing dynamics of pluripotency and primi- 
tive streak genes to exclude the AME and intermediate state cells. One 
hundred and thirty-six cells with FPKM values of NANOG more than1 
were specifically used for violin plots of pluripotency genes and gene 
regulatory networks analysis. 


Identifying genes related to different cell types 

A feature vector of one cell type was defined as a binary vector with 
values of 1 for the cell types under consideration and 0 for other cell 
types. Genes with Pearson’s correlation coefficients of least 0.4 with 
the feature vector for a particular type of cell were regarded as genes 
related to the cell type under consideration. One hundred and thirty-six 
cells with FPKM values of NANOG more than 1 were specifically used for 
violin plots of pluripotency genes and gene regulatory networks analy- 
sis (136 cells reclassified were used in Extended Data Figs. 5g, | 9e, f-i). 


Comparisons with publicly available scRNA-seq from pre- 
implantation human embryos 

The previously described analytical strategies and datasets» were used 
to combine and analyse our scRNA-seq data from 6-9-d.p.f. embryos 
and the data (later blastocyst or 6-7-d.p.f. blastocysts) from three pre- 
vious studies with PCA’. The The 12 previously described lineage 
marker genes” were used in the PCA analysis. The combined dataset 


was aligned to genome to calculate the FPKM values of genes, and ana- 
lysed by Seurat (v.2.3.4). 


Identification of differentially expressed genes between groups 
Differentially expressed genes between two groups were obtained by 
using edgeR™. Genes with uncorrected Pvalues (likelihood ratio tests) 
smaller than 0.01, absolute fold difference larger than 2 and median 
of FPKM larger than 1in one group were regarded as differentially 
expressed genes. 

For HESX1'T and HESXI T' cells, EPIs at 14 d.p.f. were classified as two 
groups, HESX1'T and HESX1 T*, based on HESX1 and T expression in 
these cells. Cells positive for 7(FPKM > 1) and negative for HESX1 (FPKM 
<1) belonged to the HESXI T’ group, and cells negative for T(FPKM <1) 
and positive for HESX1 (FPKM > 1) belonged to HESXI'T group. The 
lists of down- and upregulated genes in HESXI'T cells compared to 
HESX1 T‘ cells are shown in Supplementary Table 3. 


Co-expression gene network analysis 

Intotal, 181 scRNA-seq profiles of 7-9-d.p.f. blastocysts (in Supplemen- 
tary Table 1) were used to perform co-expression gene networks (11 
cells were not included owing to high expression of two lineage-marker 
genes). These 181 cells were assigned as EPI, PrE and TrB cells based on 
expression of selected marker genes. We filtered the genes by keeping 
those with log,-scaled expression (FPKM + 1) values larger than 4 in at 
least one sample and with correlation coefficients of at least 0.4 with 
the feature vector of one of the three cell types (that is, EPI, PrEand TrB). 
Filtered genes were used to construct co-expression networks of genes 
with WGCNA®. Three gene modules with the highest correlation coef- 
ficient values in the three cell types were used to draw gene modules 
for each cell type. Only transcription factors in each module were kept 
when visualizing the three gene modules with Cytoscape (v.3.6.1)*”. 
The eigengene value matrix between each scRNA-seq profile and the 
nine identified modules calculated by WGCNA analysis was used to 
perform a 2D hierarchical clustering and visualized with pheatmap 
function of the R platform. 


GO and KEGG pathway analysis 

Enriched GO terms and KEGG pathways for the genes related to dif- 
ferent types of cells were identified using KOBAS3™. Significant GO 
terms and KEGG pathways were visualized with the ggplot functionin 
the ggplot2 package inR. 


Gene regulatory network analysis 

The PluriNetWork” was trimmed by keeping genes and their relations, 
if genes had at least 10 FPKM in 60% of the subtype of EPI cells or if 
the gene expression levels were three times higher in the subtype of 
EPI cells than other cells. The trimmed networks were visualized with 
Cytoscape (v.3.6.1)>*. 


Comparisons of human and monkey EPI development by 
analysing scRNA-seq profiles 

The genome and annotation of cynomolgus monkey (version mfa5.0) 
were downloaded from the NCBI Genome database. scRNA-seq profiles 
of 213 cynomolgus monkey (Macaca fascicularis) EPI cells reported 
previously” were aligned to the monkey genome with the same options 
when analysing human scRNA-seq profiles. The 222 human EPI and 213 
cynomolgus monkey EPI scRNA-seq profiles were combined to keep 
the FPKM values of the 16,487 homologous genes. The 16,487 com- 
mon genes of human and cynomolgus monkey were filtered to keep 
12,475 genes with log,(FPKM + 1) values of at least 4 in at least one of 
the 222 human or 213 cynomolgus monkey cells. Raw FPKM values of 
the 12,475 genes for the 222 human or 213 cynomolgus monkey cells 
were normalized with the ‘normalize data’ and scaled with the ‘scale 
data’ of the Seurat package in R™. Normalized and scaled FPKM values 
were used to perform PCA analysis by using the prcomp function in 


R. PCA results were visualized with MatLab (MathWorks) for the first 
three principle components. The second and third components were 
also used to visualize the PCA results with the ggplot function in the 
ggplot2 package of R. A total of 966 homologous genes in human and 
monkey that contributed highly to PC1 (with absolute PCI values of 
more than 2 s.d. of the 12,475 genes) were clustered and visualized 
with the heatmap function of the R platform. The 294 and 672 genes 
with PC1 values of >2 and <—2s.d. were used to perform GO enrichment 
analysis with KOBAS3°*.. In total, 1,152 genes with significant loading 
scores for PC2 and PC3 (radius of PC2 and PC3 > 3 s.d and none of the 
1,151 gene overlapping with 996 genes that had significant scores for 
PC1 loading) were clustered and visualized with the pheatmap function 
of the R platform. 


Statistical analysis 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. Errors 
and error bars represent s.e.m. from a minimum of five independent 
embryos unless otherwise indicated. Figures display representative 
results. Unless otherwise specified, the results were the same across 
allthe embryos analysed. For cell number, the significance difference 
between two samples was evaluated by unpaired two-sample Student’s 
t-test using Excel software (2016). For gene expression, the differences 
in different cell types were analysed by Wilcoxon rank-sum test. P< 0.05 
was considered as statistically significant differences. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Establishing a3D blastocyst-culture system. This 
figure is related to Fig. 1. a-f, Modification of human embryo culture medium. 
To test human embryo development, human embryos at 5-6 d.p.f. were 
cultured onlow attachment plate without any 3D extracellular matrix up to 
14.d.p.f. By sequentially culturing in IVC1 (6-8 d.p.f.) and IVC2 (8-14 d.p.f.) 
media, 6.67% (2 out of 30 embryos) of embryos could survive until 14 d.p.f. 

a, Schematics of improving culture medium. Sodium lactate, sodium pyruvate 
and ROCK inhibitor (Y27632) were added to the IVC1 and IVC2 media, referred 
toas mIVC1 (6-8 d.p.f.) and mIVC2 (8-14 d.p.f.), respectively. Culture in mIVC1 
and mIVC2 enabled 25% (4 out of 16 embryos) of human blastocysts to develop 
up to14d.p.f. (c).b, Representative developing embryos based on 
morphological observation. n=16 independent embryos from three 
independent experiments.c, Representative embryos with abnormal 
development. Abnormal embryos displayed growth arrest or had obvious cell 
death or fragmentations. n=13 independent embryos from three independent 
experiments. d, Representative staining of abnormal embryos with CK7, 
GATA6 and OCT4 at 14 d.p.f. In all six examined embryos, consistent data were 
obtained. e, Staining of developing embryos (2 out of 3 embryos) with CK7, 
GATA6 and OCT4 at 14 d.p.f. f, Quantification of developmental rates of human 
embryos cultured in control medium (IVC1 and IVC2) and modified medium 
(mIVCland mVIC2).n=30 and 16 biologically independent embryos, 
respectively. Developmental rates of human embryosat lland 13 d.p.f. were 
based onthe two following requirements by morphology: obvious expansion 
over culture and absence of obvious cell death mass and fragmented 
phenotypes. At 14 d.p.f., we determined the embryo development ratio by 
staining CK7 (TrB), GATA6 (PrE,) and OCT4 (EPI). g-q, Representative human 
embryo development after culture under four different 3D conditions over 
development. The limited number of embryos only enabled us to compare 
embryo development under four conditions. As the implantation time window 
is 8-10 d.p.f., we embedded embryos with Matrigel at 9 d.p.f. Embryo 
development was verified on basis of morphological observation and staining 
of specific markers for OCT4, GATA6 and CK7. g-j, Top, schematics of in vitro 
3D culture of human blastocysts under different culture conditions. Middle, 
representative images of human embryos during development. Bottom, 
representative stained images of cultured human embryos under different 3D 
conditions at 14 d.p.f.g, Human embryos at 5-6 d.p.f. were cultured onlow 
attachment plate without Matrigel (W/O Matr) upto 14 d.p.f. The outermost 
TrBs showed signs of apoptosis (blue arrowheads), as determined by 
morphological observations and CK7 staining, which suggests that the 
condition was unsuitable for TrB development and survival required for 
attachment. In total, 2 of 20 embryos (three independent experiments) 
survived up to 14 d.p.f. and displayed normal EPI, PrE and TrB development. 

h, Human embryos were embedded in 25% Matrigel at 9 d.p.f. for continuous 


culture up to 14 d.p.f.n=25 embryos from three independent experiments. The 
invasion and outgrowth of TrBs were observed and embryos became flat, which 
suggests a higher concentration of Matrigel is advantageous to differentiation 
and development of TrBs, as confirmed by CK7 staining. i, Human embryos 
were embedded in 10% Matrigel on the new well, which was pre-coated with 
100% Matrigel (30 pl) (Matr+10%Matr), at 9 d.p.f. for continuous culture up to 
14 d.p.f. Although embryos displayed considerable expansion over culture, 
staining with lineage markers showed that EPIs in most embryos were lost over 
development. Only 1 of 33 embryos from three independent experiments grew 
to 14 d.p.fand was accompanied by EPI, PrE and TrB development. The negative 
outcome may indicate that high concentrations of Matrigel can inhibit EPI 
development. In h andi, red arrowheads indicate TrBs invading into Matrigel. 
j, Human embryos were embedded in 10% Matrigel at 9 d.p.f. for continuous 
culture up to 14 d.p.f. Compared with the 25% Matrigel and Matr+10% Matr 
conditions, human embryos in 10% Matrigel increased in size at the thickness 
(Zaxis) and showed better 3D spatial structures. In total, 4 of 17 embryos from 
three independent experiments grew to 14 d.p.f. with normal development. 

k, Quantification of the mean diameter of human embryos cultured under 
different 3D conditions during development by analysis of 5-14 embryos from 
three independent experiments. Data are mean +s.d.I, Quantification of 
human developmental embryos during culture in different 3D conditions. Data 
were based on morphological observations only. Human developing embryos 
met the two following requirements: obvious expansion over culture; absence 
of obvious cell death mass or fragmented phenotypes. m, Quantification of 
developing embryos in different 3D conditions. n=20 (W/O Matr), 25 (25% 
Matr), 33 (Matr+10% Matr) and 17 (10% Matr) blastocysts. Developing embryos 
met the following requirements: obvious expansion over culture; absence of 
obvious cell death mass or fragmented phenotypes; and development of EPIs, 
PrEs and TrBs and formation of amnion identified by OCT4, GATA6 and CK7 
staining. Although embryos in the 25% Matr and Matr+10% Matr culture 
conditions have normal morphologies, some embryos lacked OCT4' EPIs or 
GATAG6‘ PrEs and gave rise toa high proportion of TrBs, which suggest that high 
concentrations of Matrigel could inhibit EPI development and promote TrB 
proliferation. n—q, Three-dimensional construction of human embryos 
cultured in10% Matrigel (3 out of 3 embryos from three independent 
experiments).n, A representative 3D reconstruction of EPIs. Inset shows OCT4 
staining of one section fromthe same embryo. o, A representative 3D 
reconstruction of SYS and amnion including an embryonic disc and an amniotic 
cavity (see Supplementary Video 1). p, Three-dimensional reconstruction of 
TrBs (see Supplementary Video 2). q, Three-dimensional magnification of TrBs 
close to the amnion side. Scale bars, 100 pm (phase-contrast) or 50 um 
(staining). 
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Extended Data Fig. 2| Representative z-series of an 8- and a10-d.p.f. embryo. 
This figure is related to Fig. 1f, g.a, Series of confocal z-sections of the 8-d.p.f. 
embryo stained for OCT4 (green), GATA6 (grey) and CK7 (red). The PYS cavity 
(white arrows) was surrounded by few GATAG’ PrEs. Similar phenotypes were 
observed in 3 out of 4 embryos from three independent experiments. b, Series 


of confocal z-sections of the whole embryo stained for OCT4 (green), GATA6 
(grey) and CK7 (red) showing that the PYS cavity in the 10-d.p.f. embryo 
becomes more distinct. Similar phenotypes were observed in 3 out of 3 
embryos from three independent experiments. The yolk sac cavity (red arrows) 
was surrounded by GATA6’ Pr Es. Scale bars, 50 pm. 
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Extended Data Fig. 3 | See next page for caption. 


Extended Data Fig. 3 | Lineage delineation by transcriptome. This figure is 
related to Fig. 2.a-c, Quality control of single-cell RNA-sequencing data. 

a, Sequence quality was evaluated by FastQC. b, Total reads, mapped reads and 
mapping ratios of 557 single cells. c, Saturation curve of sequencing. 

d,e, Integrated analysis of embryonic single-cell data from different source. 
We used the analytical strategies developed previously” to analyse single-cell 
RNA-seq data (255 single cells) from 6-9-d.p.f. embryos in the study and single- 
cell RNA-seq data (216 single cells) from three previous reports'* * (later 
blastocysts or 6-7-d.p.f. blastocysts). The three datasets have previously been 
analysed’’. d, PCA based on 12 lineage markers (NANOG, SOX2, KLF17 and TDGF1 
for EPI; PDGFRA, GATA6, GATA4 and SOX17 for PrE; GATA3, GATA2, KRT18 and 
TEAD3 for TrB) showed clear separation between EPI, TrB or PrE could be 
attained for nearly all samples including our single cells from 6-9-d.p.f. 
blastocysts, which indicates that lineage delamination occurs at 6 d.p.f. The 
result is consistent with previous findings". e, SNE analyses using 4,333 viable 
genes across all samples. The samples from previous studies were defined into 
four types: intermediate cells, EPI, PrE and TrB. The combined Seurat revealed 
that most of cells independent of cell resource mixed well. Although most of 
samples were clustered into EPI, PrE or TrB, similar to the results using 12 
lineage genes (d), some cells from 6-d.p.f. embryos remained inan 
intermediate state with overlapping expression of POUSF1, GATA6, PDGFRA and 
GATA3. Compared with cells from 6-d.p.f. embryos, cells from 7-9-d.p.f. 
embryos displayed a clearer separation. These data showed that cell fates of 
7-9-d.p.f.embryos became more fixed. f-j, Lineage delineation by 
transcriptome. Analysis of genes corresponding to EPI, PrE and TrB from 
7-9-d.p.f. embryos to understand the regulators involved inthe segregation 
process. f, Heat map of lineage-specific genes of EPI, TrB and PrE from 7-9-d.p.f. 
embryos (Supplementary Table 1). Their representative transcriptional factors 
and KEGG pathways are shown, respectively. GO terms and KEGG pathways 
showed EPI-specific genes associated with signalling pathways regulating 
stem-cell pluripotency including PI3K-AKT, p53, RAP1and MAPK. 


PrE-expressing genes related to TGFB, PPAR and Ras signalling pathways. TrB- 
specific genes contributed to Hippo, HIF, PPAR and thyroid receptor signalling 
pathways. Notably, the PIZK-AKT signalling pathway was enriched in EPls, PrEs 
and TrBs. To explain the difference, we examined gene expressions of the PI3K- 
AKT signalling pathway components in three cell types and found that cell 
types specifically expressed different genes of PISK-AKT pathway 
(Supplementary Table 1.4). g, WGCNA dendrogram indicating different gene 
modules in all single cell samples from 7-9-d.p.f embryos. Three major 
branches corresponded to PrE (brown module), TrB (blue module) and EPI 
(turquoise module). h-j, Hub-gene-network analysis of transcriptional factors 
specific for PrE (brown module), TrB (blue module) and EPIs (turquoise 
module). The size of dots represents hubness. h, Hub-gene network of the EPI- 
specific gene module. In addition to well-known transcription factors (VANOG, 
PRDMI4, SOX2, OCT4 (also known as POUSF1), ZSCANIO and KLF17), new 
candidate factors may associate with EPI differentiation, suchas VENTX, 
BCL1IA, PBX1 and ARGFX.i, Hub-gene network of the PrE-specific gene module. 
High correlations of some transcription factors with PrE differentiation 
included GATA4, SOXI7, GATA6 and HNFIB.j, Hub-gene network of the TrB- 
specific gene module. TrB-specific transcription factors, suchas MYBL2, 
TFAP2A, DLX6 and GCM1.k, Comparison of lineage-specific total genes 
overlapping between a previous study’* and this study. In the previous study’, 
by analysing 5-7-d.p.f. embryos and combining the lineage-specific results, 
439, 820 and 222 genes—which significantly maintained TrB-, EPI- and PrE- 
specific genes, respectively—were identified. Comparison analysis showed 
that although we identified more EPI-, PrE- and TrB-specific genes by our 
resource data, core lineage transcription factors (VANOG, POUSF1 and SOX2 for 
EPI; GATA6, SOX17 and GATA4 for PrE; GATA2 and GATA3 for TrB) are maintained 
cross different samples. The difference in gene expressions may be 
contributed by different development stages of embryos. Difference of gene 
expressions, including transcription factors, is summarized in Supplementary 
Table 1.6. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | AME separation from EPI. a, E-cadherin expression in 
amnion froma12-d.p.f. embryo. Right panels show magnified squares (3 out of 
3 embryos). White and red lines indicate nuclei and apical orientation, 
respectively. b, Quantification of expression of E-cadherin (AME = 33 cells, 
EPI=44 cells) and OCT4 (AME = 22 cells, EPI=44 cells) in columnar EPIs and 
squamous AME. Dataare mean +s.d from 3 embryos. ***P< 0.001, two-sided t- 
test.c,d, Representative staining of laminin at 12 and 14 d.p.f. (3 out of 3 
embryos). Right panels show magnified squares from d.e, f, Representative 
staining and quantification of NANOG expression (AME = 33 cells, EPI=30 cells, 
and SOX2 expression (AME = 34 cells, EPI=30 cells) in14-d.p.f. EPIs and AME (3 
out of 3embryos). Dataare mean +s.d. **P< 0.01, ***P< 0.001, two-tailed t-test. 
Right panels show magnified squares frome. g, h, Representative staining and 
quantification of laminin and E-cadherin in 6-d.p.f. embryos (3 out of 3 
embryos).i,j, Representative staining and quantification of laminin and 
E-cadherin in 8-d.p.f. embryos (3 embryos). White long lines inh andj show 
positions used to plotintensity profiles (right). k, l, Representative staining of 
laminin and E-cadherin in10-d.p.f. embryos (3 out of 3 embryos). Together, we 
conclude that AME separation from EPIs correlates with asymmetrical 
distributions of E-cadherin and laminin (a-l). m, Representative staining of 
EZRINin14-d.p.famnion (2 out of 2embryos). White arrows indicate EZRIN 
expression in apical surface in EPIs and AME. n, Representative staining of WGA 


in14-d.p.f. embryos (2 out of 2embryos). Red arrowheads indicate WGA 
expression in extra-embryonic cells. 0, SNE analyses revealed three clusters of 
12-and14-d.p.f. embryos—AME, intermediate state cells and EPIs. p, Compared 
to EPIs, the violin plots show AME significantly downregulated pluripotency 
genes and upregulated genes that specifically expressed in the AME of 
12-17-d.p.f monkey embryos or self-organized amnion from human pluripotent 
stem cells. All violins have the same maximum width; black dot denotes 

the mean.o, p, AME, n=13 cells; intermediate state cells, n=26 cells; EPI, 53 
cells. q-t, Gene expression profiles of AME and EPI in the 12- and 14-d.p.f. 
embryos (Supplementary Table 2). AME, n=12 cells (one single cell with high 
NANOG expression was not included); EPI, 53 cells. q, Volcano map of 
differentially expressed genes (DEGs) between AME and EPI in the 12- and 
14-d.p.f. embryos. DEGs were defined with uncorrected P< 0.01 (two-sided 
likelihood ratio tests) and log,-transformed fold change > 1 or <-1, and median 
FPKM > lin one group. r, Heat map of DEGs between the AME and EPI. Right 
panel presents representative transcription factors.s, Compared tothe EPIs, 
GOterms of upregulated genes in the AME. t, Compared to EPI, KEGG pathways 
of genes enriched inthe AME. u, hCG was expressed in the AMEs, but not inthe 
EPIs (2 out of 2embryos). White arrows indicate the AMEs have squamous 
nuclear shape, expressed hCG, but downregulated the pluripotent gene, OCT4. 
Scale bars, 15 pm (a,1), 20 pm (c, e), 25 um (d, g, hh, k, m,n, u) or 50 um (i). 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Humanembryosat 14 d.p.f. initiate anterior- 
posterior polarity and generation of PSA in 3D-culture conditions. This 
figure relates to Fig. 3.a, OCT4, GATA6 and TBXT T staining of sections froma 
14-d.p.f. embryo (2 out of 2embryos), showing that T' cells originated from the 
EPI compartment close to the AME compartment boundary at 14 d.p.f. 

b, Representative OCT4, OTX2 and SOX2 staining. Arrowindicates an OTX2* 
cell.n=4 of 5Sembryos from three independent experiments displayed 
consistent data. c—e, LEFTY1 (c, d;n=3 out of 4 embryos from two independent 
experiments) and OT X2 (e;n=3 out of 5embryos from two independent 
experiments) immunofluorescence was only detected on the side of 14-d.p.f. 
embryonic disc. Arrows indicate LEFTY1° or OTX2’ cells. Right image ind is 
magnification of the square in the left image. The exclusive expression of 
NANOG and SOX2 was not observed in EPIs (e). f, Staining of OCT4, HESX1 and 
GATAG6 inthe 12-d.p.f. embryos (2 out of 2embryos). g, The violin plots show 
dynamic expression of HESX1 during EPI development. All violin plots have the 
same maximum width, black dot denotes the mean. h, Correlation of HESX1 and 
Texpression of 14-d.p.f. EPIs, as determined by scRNA-seq. Each plot 
represents a single cell. i, Volcano plots show DEGs in HESXI1'T (10 single cells) 
and HESX1 T' (12 single cells) EPIs by scRNA-seq. DEGs were defined as those 
with uncorrected P< 0.01 (likelihood ratio test) and fold change of >2 or <-2, 
and median FPKM > 1in one group.j, k, Staining of OCT4, FLK1and T at 14 d.p.f. 
(2 out of 2embryos from two independent experiments). Red arrows denote 
migrating T’ cells; white arrows denote FLK1’ extra-embryonic mesenchyme. 


I, The violin plots show expression dynamics of primitive streak genes over 
pluripotent-stem-cell development. All violins have the same maximum width, 
black dot denotes the mean. In total, 136 cells were included (Extended Data 
Fig. 9e): ICM, n=49 cells; pre-EPI, n=23 cells; post-EPI, n=48 cells; PSA-EPI, 
n=16 cells. *P<0.05, ** P< 0.01, two-sided Wilcoxon rank-sum test. 

m-o, Absence of specific neural gene expression indicates 14-d.p.f embryos do 
not generate the initial nervous system, which meets the internationally 
recognized ethical limit for human embryo culture. m, Violin plots of dynamic 
expressions of neural-specific genes in EPIs over embryo culture. All violins 
have the same maximum width, black dot denote the mean. In total, 136 cells 
were included: ICM, n=49 cells; pre-EPI, n= 23 cells; post-EPI, n=48 cells; PSA- 
EPI, n=16 cells.n,o, Representative staining of PAX6, OCT4, SOX1and FOXA2in 
human 14-d.p.f embryos (3/3 embryos). p-r, Development and cell 
proliferation of human embryos cultured in the Matr+10% Matr condition. 
Quantified data at each stage were based on five embryos from three 
independent experiments. Data are presented as mean +s.d. p, Quantification 
of the dynamics of total cell number per embryo during culture. *P< 0.05, 

**P< 0.01, two-sided Student’s t-test. q, Dynamics of OCT4* EPIs and GATA6* 
PrEs per embryo over culture. r, Dynamics of CK7’ TrBs per embryo over 
culture. EPIs and PrEs maintained gradual proliferation at 8-10 d.p.f., after 
which their proliferation speeds accelerated. However, TrBs always maintained 
arapid proliferation rate, which may be for establishing cell connections with 
the maternal environment. Scale bars, 25 um. 
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Extended Data Fig. 6 | Z-series of a14-d.p.f. embryo witha PSA. This figure is epithelium, consistent with the distribution of some T' cells inthe monkey 
related to Fig. 3. Series of confocal z-sections of the embryo stained for T amnion epithelium”. T' cells (red arrows) in the 16# section disrupted the 
(green), OCT4 (red) and N-cadherin (grey), showing formation of PSA. The N-cadherin-forming basement-membrane barriers between epiblast and 
thickness of every section was 12 1m. Numbers on the left indicate the number hypoblast and focally migrated from the embryonic disc to generate the PSA. 
of sections. Red and white arrows indicate T’ and N-cadherin‘ cells, Similar data were observed in 3 out of 3 embryos from two experiments. Scale 
respectively. Some T' cells (red arrows) inthe13#sectionlocatedintheamnion — bars,50ym. 
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Extended Data Fig. 7 | Z-series of a14-d.p.f. embryo with a cell emigration respectively. In the 7# and 8# sections, some migrating T’ EPIs invaded the 
region. This figure is related to Fig. 3. Series of confocal z-sections of the space near the visceral endoderm and co-expressed GATAG6 (white arrows). 
embryo stained for GATA6 (green), OCT4 (red) and T (grey). The thickness of Similar data were observed in 4 out of 5 embryos from three experiments. Scale 
every section was 12 pm. Numbers onthe left indicate the number of sections. bars, 50 pm. 


Red and white arrows denote T'GATA6‘OCT4 cells and T’'GATA6‘OCT4' cells, 
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Extended Data Fig. 8 | See next page for caption. 


Extended Data Fig. 8 | Development of TrB lineage. a—i, Representative 
confocal staining of human embryo section.a, Staining of CK7 and F-actin.n=5 
independent embryos from three independent experiments. b,c, Cells near 
the EPI-PrE bilayer had a single nucleus that expressed TEAD4 and E-cadherin. 
E-cadherin displayed symmetrical distributions inthe embryos (3 out of 3 
embryos for each staining from two experiments). d, e, Multinucleated cells 
(d1, eland e2) and cells with a single nucleus (d2) (3 out of 3embryos each 
staining from two experiments). Dashed lines ind and e outline the region of 
interest incells. f, g, Representative staining of CK7 and HLA-G (3 out of 3 
embryos) in12-d.p.f. (f) and 14-d.p.f. (g) human embryo. h, i, Representative 
staining of TEAD4 and HLA-G (h) or TEAD4 and hCG (i) in14-d.p.fhuman 
embryo (3 out of 3 embryos). Inf, the inset is magnification of the square. Ing-i, 
right panel is magnification of the square. Ini, the inset is magnification of the 
region indicated by ared arrow.j-I, t-SNE plot of 352 TrBs. Cells (dots) coloured 
according to the original inferred lineage identity (k) and embryonic stage (I). 
m, Lineage segregation path constructed by Monocle based on developmental 


time (left) and cell types defined with selected markers (right). n, Heat map of 
polypeptide hormone genes expressed in the six types of trophoblasts during 
culture. o, Heat map indicates expression of genes specific for each cell type. 
Representative genes and key GO enrichments shown. GO terms and KEGG 
pathways of genes specific for different subtypes of TrBs (multiple test 
corrected P< 0.01, hypergeometric tests) in the six types of TrBs from pre- 
implantation stage embryos to 14-d.p.f embryos are shown in Supplementary 
Table 4. Pre-CTB-expressing genes related to cell metabolism, biosynthesis 
and cell differentiation, were in accord with the characteristics of trophoblast 
stem cells. High expressions of NF-KB, as well as canonical and non-canonical 
Wnt signalling pathway genes, indicate potential functions onCTB 
development. STB-specific genes indicated hormone secretion, whereas early- 
STB-expressing genes associated with cell differentiation and migration, 
dependent on several signalling pathways. EVT-specific genes contributed to 
regulating the immune system and angiogenesis. Scale bars, 50 pm (a-c, e-i) or 
20pm (b). 
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Extended Data Fig. 9 | See next page for caption. 


Extended Data Fig. 9 | Epiblast development during embryo culture inthe 
3D condition. This figure relates to Fig. 4.a-d, Dynamic expressions of 
pluripotent genes over human embryo developmentin the 3D condition. a, The 
dynamic expressions of OCT4, TFCP2L1and KLF4 during human embryo 
development (3 out of 3embryos). b, The dynamic expressions of OCT4, SOX17 
and KLF17 during human embryo development (3 out of 3 embryos). Loss of 
TFCP2L1, KLF4 and KLF17 at 10 d.p.f. (implantation stage) indicate the 
pluripotent state transition of epiblasts. c, Expression of NANOG and PRDM14 
at10-d.p.f. human embryos (2 out of 2embryos).d, Dynamic expressions of 
OCT4, CD24 (a primed pluripotency gene) and KLF17 during human embryo 
development (2 out of 2embryos).e, ¢-SNE of pluripotent stem cells. To exclude 
cells from the AME and intermediate state in the epiblast cluster, we excluded 
the NANOG-negative cells and maintained 136 cells with high expression of 
NANOG. f, The violin plots show the dynamics of naive, primed and common 


pluripotency genes in pre-gastrulation embryos. All violins have the same 
maximum width, black dot denotes the mean. AME and intermediate state cells 
were excluded in the synthesis. *P< 0.05, **P< 0.01, ***P< 0.001, two-sided 
Wilcoxon rank-sum test. g-j, PluriNetWork analysis of EPIs from ICM (g), pre- 
EPI (h), post-EPI (i) and PSA-EPI (j) stages revealed that key pluripotency 
regulators dominated the networks. k, DEGs during EPI development. GO 
terms and representative genes in DEGs in the pairwise comparisons are 
indicated. I-n, Scatter-plot comparison of the gene-expression levels between 
ICM and pre-EPI (I), pre-EPI and post-EPI (m), and post-EPI and PSA-EPI (n). Key 
genes are annotated (Supplementary Table 6). Red denotes upregulated; green 
denotes downregulated; >twofold difference, uncorrected P< 0.01 (likelihood 
ratio test) and median FPKM > 1in one group. Scale bars, 50 um (a, b), 100 pm (c) 
or 25m (d). 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 |Comparison of monkey and human EPI development. 


a, PCA of the EPI lineage as determined by the expressed genes among all 
groups of EPIs during development in human (hu) (circles) and cynomolgus 
monkey (cy) (squares). In total, 12,475 out of 16,487 annotated gene expressed 
among humanand monkey cells (human, 222 cells; monkey 213 cells) were 
used. b, Heat map of 966 genes that highly contributed to PC1 (>2s.d. of PC1). 
c, Heat map of 1,152 genes with significant scores for PC2 and PC3 loading 
(radius of PC2 and PC3 >3s.d.) during monkey and human EPI development. 
None of the 1,151 genes overlapped with 996 genes with significant scores for 
PClloading. d, The violin plots of pluripotency genes over EPI pluripotency 
transition in monkey embryos. Monkey scRNA-seq data were froma published 
database”. e, The violin plots of pluripotency genes during the EPI 
pluripotency transition inhuman embryos. We observed stable expressions of 


STAT3 and TBX3 witha trend of gradually increasing of UTF1, NROB1, LIFRand 
SOX15 during human EPI pluripotent state transition. f, The violin plots showed 
dynamics of BMP signalling pathway gene expression in EPI pluripotency 
transition of human and monkey embryos. Monkey scRNA-seq data were 
obtained froma published database’. g, The violin plots showed dynamics of 
FGF signalling pathway gene expression in EPI pluripotency transition of 
human and monkey embryos. h, The violin plots showed dynamics of Notch 
signalling pathway gene expression in EPI pluripotency transition of human 
embryos. Ind-h, all violins have the same maximum width, black dot denotes 
the mean. Ine-h, AME and intermediate state cells were excluded inthe 
synthesis and only included 136 single cells. Monkey scRNA-seq data were 
obtained froma published database®. See also Supplementary Table 7. 
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For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


O A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


Oo For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


[ ] Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection LAS X (2.6.0 build 7266) software were used to take pictures and 3D videos. 


Data analysis The software used for scRNA-seq data analysis have been described in Materials and Methods, including the key parameter. Image J 
(version 1.51 j8) was used to count cell numbers; FASTQC (Version 0.11.8) and MULTIQC(v 1.6) were used for quality control; HISAT2(v 
2.1.0) was used to align the scRNA-seq profiles to the human genome. Stringtie (v 1.3.4) was used to calculate the abundances of genes. 
Seurat (version 2.3.4) and Monocle (v 2.4.0) in R were used to perform t-SNE and trajectory analysis of the scRNA-seq profiles. For cell 
number, the significance difference between two samples was evaluated by unpaired two-sample Student’s t-test using Excel software 
(2016). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


Source data for Extended Data Fig.1f, k, 11, 1m, 4b, 4f, 4p, 5g, 5h, 51, 5m, 5p, 5q, 5r, 9f, 10d, 10e, 10f, 10g, 10h are provided with the paper. The single-cell RNA- 
sequencing data have been deposited in the GEO. Accession numbers for the data generated in this study and for the published data used in this study are as 
follows. The scRNA-seq data in this study: GSE136447; those SC3-seq data of cynomolgus monkey embryos (for Extended Data Fig.10), GSE74767 (ref.39); scRNA- 
seq data of human pre-implantation embryos (for Extended Data Fig.3d, e): GSE66507(ref.16), GSE36552 (ref.17) and E-MTAB-3929 (ref.18). 
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Sample size No statistical methods were used to predetermine sample size. The number of embryos used in each experiment were provided in methods 
and figure legends. For these experiments with some variations, we examined at least 3. For these experiments with highly consistent data, 
we used at least two embryos because of limited embryos. Fir RNA-Seq, the sample size was determined when the main cell lineages at each 
developmental stages were captured. Related statistical analysis provides the rationale for sufficiency of the sample sizes. 
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Data exclusions Two scRNA-seq profiles whose numbers of genes with abundance levels more than 1 FPKM were smaller than 2000 were excluded. The 
criteria was established in the analysis procedure when we found that the two scRNA-seq profiles with < 2000 expressed genes were isolated 
far away from other scRNA-seq profiles. 


Replication Methods and figure legends indicated the exact number of embryos replicated in each experiment. All attempts at replication were 
successful. 


Randomization | The embryos used in each experiment were chosen at random. The experiments were not randomized. 


Blinding The investigators were not blinded to allocation during experiments and outcome assessment. Data collection and analysis were performed 
by different people, the sample classification were replaced by simple marks during data analysis. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


im Animals and other organisms 


Human research participants 


| Clinical data 


Antibodies 


Antibodies used Immunostaining: 
mouse anti-OCT3/4 (Santa Cruz, SC5279,C-10, H1612 1:400), rabbit anti-Brachyury (T) (Santa Cruz, SC20109, polyclonal, A0616, 
1:50), rabbit anti-SOX2 (Millipore, AB5603, polyclonal, 2826070, 1:400), goat anti-SOX17 (R&D Systems, AF1924, polyclonal, 
KGA0815042,1:250), rabbit anti-KLF4 (Millipore, 09-821, polyclonal, 2383578, 1:400), rabbit anti-KLF17 (Atlas Antibodies, 
HPA024629, polyclonal, C117502, 1:250), rabbit anti-B-Catenin (Abcam, AB32572, E247, GR184212-37, 1:300), mouse anti-E- 
cadherin (Abcam, AB76055, M168, GR299147-4, 1:100), rabbit anti-Cytokeratin 7 (CK7) (Abcam, AB181598, EPR17078, 
GR3214132-10, 1:300), rabbit anti-N-cadherin (Abcam, AB12221, polyclonal, 40975, 1:200), mouse anti-OTX2 (Santa Cruz, 
$C514195, D-8, GO816, 1:100), goat anti-LEFTY1 (R&D Systems, AF746, polyclonal, CMM0111101, 1:100), goat anti-SOX1 (R&D 
Systems, AF3369, polyclonal, XUVO618081, 1:500), rabbit anti-PAX6 (Biolegend, 901301, Poly 19013, b267205, 1:500), rabbit 
anti-Laminin (Sigma-Aldrich, L9393, polyclonal, O28M4890V, 1:50), rabbit anti-TEAD4 (Atlas Antibodies, HPAO56896, polyclonal, 
R78063, 1:150), goat anti-FOXA2 (Santa Cruz, SC6554, polyclonal, D1216, 1:100), goat anti-OCT3/4 (Santa Cruz, SC8628, 
polyclonal, G3201, 1:250), Mouse anti-hCG (Abcam, AB9582, 5H4-E2, GR308272-2, 1:100), rabbit anti-PRDM14 (Millipore, 
AB4350, polyclonal, 2897240, 1:50), goat anti-TFCP2L1 (R&D Systems, AF5726, polyclonal, CCUGO115021,1:200), goat anti- 
NANOG (R&D Systems, AF1997, polyclonal, KKJO514091, 1:250), mouse anti-PODXL (R&D Systems, MAB1658, 222328, 
JKW0218041, 1:400), mouse anti-HLA-G (Abcam, AB52455, 4H84, GR251679-19, 1:200), goat anti-GATA6 (R&D Systems, 
AF1700, polyclonal, KWT0419021, 1:200), mouse anti EZRIN (Sigma-Aldrich, E8897, 3C12, 117M4875V, 1:500), rabbit anti-CER1 
(Sigma-Aldrich, HPAO19917, polyclonal, R10176, 1:50), Phalloidin (F-actin) Alexa Fluor®488 (Thermo Fisher Scientific, A12379, 
direct labeled, 1749905, 1:300), rabbit anti HESX 1 (Abcam, AB246949, polyclonal, GR3267093-1, 1:100), goat anti-Brachyury (T) 


Validation 


(R&D Systems, AF2085, polyclonal, KQP0617031, 1:200), goat anti FLK-1 (KDR) (R&D Systems, AF357, polyclonal, CVE0617081, 
1:100) and WGA Alexa Fluor® 647 (Invitrogen, W32466, direct labeled, 1988457, 1:500) 


All the antibodies have been validated by the companies from which they were offered. This information was used for further 
validate of the antibodies used in this work. Details of the validation statements, antibody profiles and relevant citations can be 
found on the manufacturer's website. 

All the antibodies used in this work are for immunostaining purpose only. We list the immunostaining validation of the 
manufactures and the number of citations as follows: 


1. Oct3/4 (SC5279): https://www.scbt.com/p/oct-3-4-antibody-c-10 

Oct3/4 (SC5279) antibody was validated by the manufacturer using glandular cells and mouse embryos. We verified that this 
antibody stains the nuclei of ICM and EPIs, as expected. More than 1303 citations. 

2. Brachyury(T)(SC20109): https://www.scbt.com/zh/p/brachyury-antibody-h-210 

Brachyury(T)(SC20109) antibody was validated by the manufacturer using human lung tissue. We verified that this antibody 
stains the nuclei of posterior EPls, as expected. More than 9 citations. 

3. Sox2 (AB5603): http://www.merckmillipore.com/CN/zh/product/Anti-Sox2-Antibody, MM_NF-AB5603 

Sox2 (AB5603) antibody was validated by the manufacture using H9 human stem cells. We verified that this antibody stains the 
nuclei of EPIs, as expected. More than 166 citations. 

4. Sox17 (AF1924): https://www.rndsystems.com/cn/products/human-sox17-antibody_af1924 

Sox17 (AF1924) antibody was validated by the manufacture using B16 mouse cell line and human BGO1V cells. We verified that 
this antibody stains the nuclei of morula and PrEs, as expected. More than 123 citations. 

5. Kif 4 (09-821): http://www.merckmillipore.com/CN/zh/search/09-821 

lf 4 (09-821) antibody was validated by the manufacture’s using NIH/3T3, A431, and HeLa cells. We verified that this antibody 
stains the nuclei of early EPls. More than 4 citations. 

6. KIf17(HPAO24629):https://www.atlasantibodies.com/products/antibodies/primary-antibodies/triple-a-polyclonals/klf17- 
antibody-hpa024629 
If 17 (HPAO24629) antibody was validated by the manufacture using human testis and tonsil tissues. We verified that this 
antibody stains the nuclei of early EPls. More than 6 citations. 

7. B-Catenin (AB32572): https://www.abcam.cn/beta-catenin-antibody-e247-chip-grade-ab32572.html 

B-Catenin (AB32572) antibody was validated by the manufacture using A431 and wild-type HAP1 cells. We verified that this 
antibody stains the membrane of the embryo cells. More than 335 citations. 

8. E-cadherin (AB76055): https://www.abcam.cn/e-cadherin-antibody-m168-c-terminal-ab76055.html 

E-cadherin (AB76055) antibody was validated by the manufacture using A431 cells. We verified that this antibody stains the 
membrane of the embryo cells which N-cadherin did not expressed. More than 137 citations. 

9. Cytokeratin7 (CK7) (AB181598) https://www.abcam.cn/cytokeratin-7-antibody-epr17078-cytoskeleton-marker-ab181598.html 
Cytokeratin7 (CK7) (AB181598) antibody was validated by the manufacture using A549 cells. We verified that this antibody stains 
the membrane of trophoblast. More than 24 citations. 
0. N-cadherin (AB12221): https://www.abcam.cn/n-cadherin-antibody-ab12221.htm 
N-cadherin (AB12221) antibody was validated by the manufacture using mouse differentiated embryonic stem cells. We verified 
that this antibody stains the membrane of mesenchymal cells. More than 84 citations. 
1. Otx2 (SC514195): https://www.scbt.com/zh/p/otx2-antibody-d-8 
Otx2 (SC514195) antibody was validated by the manufacture using Jurkat, Hep G2, Hela nuclear extract and hES differerntiated 
cells in our lab. We verified that this antibody stains the nuclei of parts of hypoblast cells. More than 2 citations. 

2. Lefty1 (AF746): https://www.rndsystems.com/cn/products/human-mouse-lefty-antibody_af746 

Lefty1 (AF746) antibody was validated by the manufacture using mouse ulters in the citation. We verified that this antibody 
stains cytoplasma AVE cells, as expected. More than 1 citations. 

3. Sox1 (AF3369): https://www.rndsystems.com/cn/products/human-mouse-rat-sox1-antibody_af3369 

Sox1 (AF3369) antibody was validated by the manufacture using ectoderm differentiated BGO1V human embryonic stem cells. 
We verified that this antibody stains neural stem cells in our lab. More than 30 citations. 

4. Pax6 (901301): https://www.biolegend.com/en-us/products/purified-anti-pax-6-antibody-11511 

Pax6 (901301) antibody was validated by the manufacture using frozen human iPSC derived neural rosettes. We verified that this 
antibody stains neural stem cells in our lab. More than 97 citations. 

5. Laminin (L9393): https://www.sigmaaldrich.com/catalog/product/sigma/l9393 

Laminin (L9393) antibody was validated by the manufacture using human Tongue sectons. We verified that this antibody stains 
basement membrane, as expected. More than 776 citations. 

6. Tead4(HPA056896):https://www.atlasantibodies.com/products/antibodies/primary-antibodies/triple-a-polyclonals/tead4- 
antibody-hpa056896/ 

Tead4(HPA056896) antibody was validated by the manufacture using human A431 cells. We verified that this antibody stains the 
nuclei of parts of EPIls and TEs, as expected. More than 4 citations. 

7. Foxa2 (SC6554): https://www.scbt.com/zh/p/hnf-3beta-antibody-m-20 

Foxa2 (SC6554) antibody was validated by the manufacture using HepG2 cells. We verified that this antibody stains the nuclei of 
neural stem cells in our lab. More than 66 citations. 

8. Oct3/4 (SC8628): https://www.scbt.com/zh/p/oct-3-4-antibody-n-19 

Oct3/4 (SC8628) antibody was validated by the manufacturer using mouse embryos. We verified that this antibody stains the 
nuclei of ICM and EPls, as expected. More than 79 citations. 

9. hCG (AB9582): https://www.abcam.cn/hcg-beta-antibody-5h4-e2-ab9582.html 

hCG (AB9582) antibody was validated by the manufacturer using human chorionic villus cells. We verified that this antibody 
stains the AME, as expected. More than 8 citations. 

20. PRDM14 (AB4350): http://www.merckmillipore.com/CN/zh/product/Anti-PRDM14-Antibody, MM_NF-AB4350#anchor_REF 
PRDM14 (AB4350) antibody was validated by the manufacturer using human lung tissue lysate. We verified that this antibody 
stains the early EPls, as expected. More than 1 citations. 

21. TFCP2L1 (AF5726): https://www.rndsystems.com/cn/products/human-tfcp2|1-antibody_af5726 

TFCP2L1 (AF5726) antibody was validated by the manufacturer using human placental tissue. We verified that this antibody 
stains the early EPIls and naive stem cells in our lab. More than 5 citations. 

22. NANOG (AF1997): https://www.rndsystems.com/cn/products/human-nanog-antibody_af1997 
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NANOG (AF1997) antibody was validated by the manufacturer using BGO1V human stem cells and embryoid body. We verified 
that this antibody stains the ICM and EPls, as expected. More than 111 citations. 

23. PODXL (MAB1658): https://www.rndsystems.com/cn/products/human-podocalyxin-antibody-222328 mab1658 

PODXL (MAB1658): antibody was validated by the manufacturer using BGO1V human stem cells. We verified that this antibody 
indicated the polarity of EPls, as expected. More than 7 citations. 

24. HLA-G (AB52455): https://www.abcam.cn/hla-g-antibody-4h84-ab52455.html 

HLA-G (AB52455) antibody was validated by the manufacturer using human spleen tissue. We verified that this antibody stains 
the cytoplasm of parts of TEs, as expected. More than 11 citations. 

25. GATA6 (AF1700): https://www.rndsystems.com/cn/products/human-gata-6-antibody_af1700 

GATA6 (AF1700) antibody was validated by the manufacturer using human gastric carcinoma cell line. We verified that this 
antibody stains the nuclei of hypoblast, as expected. More than 29 citations. 

26. EZRIN(E8897):https://www.sigmaaldrich.com/catalog/product/sigma/e8897 ?lang=zh&region=CN 

EZRIN (E8897) antibody was validated by the manufacturer using A431 cells. We verified that this antibody expressed in apical 
surface in EPls and AME, as expected. More than 40 citations. 

27. CER1(HPA019917):https://www.sigmaaldrich.com/catalog/product/sigma/hpa019917?lang=zh&region=CN 
CER1(HPA019917) antibody was validated by the manufacturer using human small intestine. We verified that this antibody 
expressed in AVE, as expected. More than 1 citations. 

28. Phalloidin (F-actin) Alexa Fluor® 488 (A12379): https://www.thermofisher.com/order/catalog/product/A12379?S|ID=srch-hj- 
A12379 

Phalloidin (F-actin) Alexa Fluor®488 (A12379) antibody was validated by the manufacturer using bovine pulmonary artery 
endothelial cells and muntjac skin fibroblasts. We verified that this antibody expressed in AVE, as expected. More than 176 
citations. 

29. HESX 1 (AB246949): https://www.abcam.cn/hesx1-antibody-ab246949.html 

HESX 1 (AB246949) antibody was validated by the manufacturer using human testis and skin tissue. We verified that this 
antibody stains the nuclei of part of EPls, as expected. No citations at present. 

30. Brachyury (T) (AF2085): https://www.rndsystems.com/cn/products/human-mouse-brachyury-antibody_af2085 

Brachyury (T) (AF2085) antibody was validated by the manufacturer using differentiated human embryonic stem cells and BGO1V 
human stem cells. We verified that this antibody stains the nuclei of posterior EPls, as expected. More than 56 citations. 

31. FLK-1 (KDR) (AF357): https://www.rndsystems.com/cn/products/human-vegfr2-kdr-flk-1-antibody_af357 

FLK-1 (KDR) (AF357) antibody was validated by the manufacturer using human placenta and kidney tissues. We verified that this 
antibody stains the extraembryonic mesoderm. More than 25 citations. 

32. WGA Alexa Fluor® 647 (W32466): https://www.thermofisher.com/order/catalog/product/W32466?S|D=srch-hj-W32466. 
WGA Alexa Fluor® 647 (W32466) antibody was validated by citations using mouse cardiomyocytes on the manufacturer’s 
website. We verified that this antibody expressed in apical surface in EPls and AME. More than 4 citations. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics 


Recruitment 


Ethics oversight 


All embryos collected are from the volunteers including wives and husbands who are no more than 35 years old. All volunteers 
have normal chromosome karyotype and no heredity case history. The embryo donors involved in this study are fertile with at 
least one healthy baby. The donors with normal body mass index are in good health. 86% couples were treated with normal IVF 
and 14% couples were treated with ICSI. 


Research donors in the study were recruited from The First People’s Hospital of Yunnan Province. Before giving consent, donors 
have received proper counselling about the implications of the donation and potential risks. Embryos were collected with written 
informed consent from the donors in this study. 


This work was approved by the Medicine Ethics Committee of The First People’s Hospital of Yunnan Province (2017LS[K]NO.035). 
The informed consent process for embryo donation complied with International Society for Stem Cell Research (ISSCR) 
Guidelines for Stem Cell Research and Clinical Translation (2016) and Ethical Guidelines for Human Embryonic Stem Cell research 
(2003) jointly issued by the Ministry of Science and Technology and the Ministry of Health of People’s Republic of China. The 
Medicine Ethics Committee of The First People’s Hospital of Yunnan Province is composed of 9 members, including lawyers, 
scientists and clinicians with relevant expertise. The Committee evaluated the scientific merit and ethical justification of this 
study and conducted a full review of the donations and use of these samples. No financial inducements were offered for the 
donations. In the process, couples were informed that their embryos would be used to study the developmental mechanisms of 
human embryos and that their donation would not affect their IVF cycle. The culture of all embryos was terminated at day 14 
post-fertilization or upon the appearance of primitive streak anlage. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Although maternal antibodies protect newborn babies from infection’”, little is 
known about how protective antibodies are induced without prior pathogen 
exposure. Here we show that neonatal mice that lack the capacity to produce IgG are 
protected from infection with the enteric pathogen enterotoxigenic Escherichia coli 
by maternal natural IgG antibodies against the maternal microbiota when antibodies 
are delivered either across the placenta or through breast milk. By challenging pups 
that were fostered by either maternal antibody-sufficient or antibody-deficient dams, 
we found that IgG derived from breast milk was crucial for protection against mucosal 


disease induced by enterotoxigenic F. coli. IgG also provides protection against 
systemic infection by £. coli. Pups used the neonatal Fc receptor to transfer IgG from 
milk into serum. The maternal commensal microbiota can induce antibodies that 
recognize antigens expressed by enterotoxigenic F. coliand other Enterobacteriaceae 
species. Induction of maternal antibodies against acommensal Pantoea species 
confers protection against enterotoxigenic £. coliin pups. This role of the microbiota 
in eliciting protective antibodies to a specific neonatal pathogen represents an 
important host defence mechanism against infection in neonates. 


Neonates are highly susceptible to microbial infections, not only 
because their immature immune system is less capable of generat- 
ing adaptive immune effectors such as antibodies’”, but also because 
they lack a diverse commensal microbiota that can antagonize patho- 
gens independently of host responses’. Neonates acquire maternal 
antibodies through the placenta and through breast milk; however, in 
humans, antibodies derived from breast milk are dominated by secre- 
tory IgA antibodies, which are thought to exert their protective function 
on neonatal mucosal surfaces through mechanisms such as toxin or 
adhesin neutralization and bacterial agglutination*>. Passive immunity 
to various pathogenic bacterial and viral infections (such as group B 
Streptococcus, Haemophilus influenzae and influenza viruses) can be 
transferred to neonates by maternal antigen-specific IgG antibodies 
induced by maternal colonization or vaccination®®. 

Although the benefits of maternal antibodies are widely accepted’, 
few studies have addressed whether maternal natural antibodies 
(mNabs)-—thatis, antibodies acquired without known exposure tothe 
pathogen or through immunization—can help neonates to defend 
against pathogens. Although the commensal microbiota can shape 
the antibody repertoire", how the diversity in mNabs is induced or 
how they mediate protection against infectious agents postnatally are 
unknown. Here we show that mNabs protect neonatal mice against both 
enteric and systemic infections with enterotoxigenic F. coli (ETEC). 
Notably, we found that the induction of mNabs depends on the com- 
mensal microbiotain pregnant dams. We show that a single commensal 


species can induce cross-reactive mNabs that protects against ETEC in 
pups. In addition to acquisition through the placenta, pups can assimi- 
late IgG mNabs directly from ingested milk into serum by a neonatal 
Fc receptor (FcRn)-dependent process. Our results provide insights 
into how the commensal microbiota of pregnant female mice drives 
antibody-dependent immunity in neonates through breast-feeding 
and demonstrate that protective IgG antibodies in breast milk act both 
locally and systemically. 


Mouse mNabs protect neonates against ETEC 


To analyse the developmental dynamics of neonatal antibodies, we used 
a reciprocal breeding strategy that enabled the tracking of maternal 
antibody persistence and antibody development dynamics in neonates. 
Maternal source, persistence and development of neonatal age-related 
IgG, IgA and IgM are shown in Extended Data Fig. 1. For the first 3 weeks, 
serum and mucosal IgG and IgA levels in pups depend completely on 
the maternal MT (also known as /ghm) genotype (uMT“ mice lack 
mature Bcells). Through this breeding strategy, we can produce pups 
that are either deficient (mNab_) or sufficient (mNab*) in maternal 
natural IgG and IgA. 

Transfer of vaccine-induced, antigen-specific antibodies confers 
passive protection in models of neonatal infection®*”. To test whether 
mNabs in unimmunized mice protect against an enteric pathogen, 
we challenged reciprocally bred 6- to 7-day-old pups with the human 
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Fig. 1|mNabs protect neonates from an enteric bacterial pathogen. 

a, Bacterial burdens of reciprocally bred mNab* and mNab’ pups (6-7 days old) 
orally challenged with 10’ CFU of ETEC 6. Ig, immunoglobulin; SI, small 
intestine. **P= 0.0004, two-tailed Mann-Whitney U-test. Dataare 
representative of four independent experiments. b, Survival among 
reciprocally bred pups 20 hafter oral-gastric challenge with 10° CFU of ETEC 6. 
Data are from three independent experiments (first experiment, n=8 mNab* 
mice, n=5mNab’ mice; second experiment, n=9 mNab* mice, n=9 mNab~ 
mice; third experiment, n=7 mNab* mice,n=6mNab mice).*P=0.0011, 
two-tailed unpaired Student’s t-test. c, Serum IgG levels in ETEC-challenged 
reciprocally bred pups. ***P= 0.0002, two-tailed Mann-Whitney U-test. Data 
are representative of two independent experiments. d, Small-intestinal 
mucosal IgG levels in ETEC-challenged reciprocally bred pups. **P= 0.0022, 
two-tailed Mann-Whitney U-test. Data are representative of two independent 


clinial isolate ETEC strain 6 (hereafter ETEC 6). ETEC 6 colonizes the 
small intestine of neonatal mice and typically causes acute and lethal 
diarrhoeal disease within 20 h of oral gastric challenge. At a sub-lethal 
dose of ETEC 6 (10’ colony-forming units; CFU), mNab* pups were more 
resistant to infectionthan mNab pups and displayed a 33-fold reduc- 
tion in intestinal colonization of ETEC 6 (Fig. 1a). Stratification by geno- 
type showed no difference in bacterial burden between wMT“ and 
uMT~ pups. Ata higher dose (10° CFU), all mNab* pups were resistant 
to ETEC 6 challenge, whereas 83% of mNab pups became moribund 
or had died within 20 h after challenge (Fig. 1b). The postnatal time of 
our ETEC challenge is too early for antigen-driven endogenous produc- 
tion of IgA and IgG; thus, the protective effects depend on maternally 
derived antibodies. We verified that IgG was detected in serum (Fig. 1c) 
and in gut luminal extracts (Fig. 1d) of only the mNab* pups. We also 
challenged reciprocally bred pups intraperitoneally and found that 
mNab* pups were more resistant to systemic infection with ETEC than 
mNab_ pups (Extended Data Fig. 2a). Previous studies showed that 
natural IgM antibodies have broad specificity and provide protection 
against bacterial and viral infections” ”. However, natural 1gM cannot 
be vertically transmitted from dams to pups (Extended Data Fig. 1e) 
and therefore is unlikely to play an important part in the protection 
against ETEC observed in our study. 

Using flow cytometry analysis, we investigated which antibody 
class was likely to mediate protection. Commensal bacteria from the 
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experiments. e, Flowcytometry analysis of natural maternal IgG and IgA 
coating of commensal bacteria of 1-week-old mNab* and mNab pups. Dataare 
representative of two independent experiments (n=4-5 mice per group in 
each experiment). f, Flow cytometry analysis of natural maternal IgG andIgA 
coating of ETEC-GFP bacteria in mNab* and mNab’ pups 18 hafter infection. 
IgG and IgA signals are gated on GFP’ population. Data are representative of 
two independent experiments (n = 4-7 mice per group in each experiment). 

g, Serum IgG levels after 1 week of cross-fostering. h, Small-intestinal IgG levels 
after 1 week of cross-fostering. i, Small-intestinal IgA levels after 1 week of 
cross-fostering.j, ETEC 6 bacterial burdens in the small intestine of pups cross- 
fostered for 1 week. ***P= 0.0002, two-tailed Mann-Whitney U-test. Data are 
representative of two independent experiments. a-d, g-j, Dataare 

mean +s.e.m. Specific n numbers are indicated. 


microbiota of uninfected mNab* pups were coated with both IgG and 
IgA, whereas bacterial cells from mNab” pups were negative for IgG and 
IgA (Fig. le), indicating that both maternal IgG and IgA that react with 
the commensal microbiota are transmitted vertically to neonates. Flow 
cytometry detected only lgG—but not IgA—on green-fluorescent pro- 
tein (GFP)-expressing ETEC (ETEC-GFP) cells (Fig. 1f). It has previously 
been shown that immunization-induced antigen-specific milk IgG coats 
Citrobacter inthe mucosa”. Our results re-affirm that maternal natural 
IgG in milk coats pathogenic bacterial cells (in our study ETEC 6), and 
further demonstrate that protection is conferred to breast-feeding 
pups, even without prior exposure of the mother to the pathogen. 

The composition of the neonatal gut microbiotain mNab* and mNab~ 
animals was similar and therefore probably not responsible for the 
differential protection against ETEC (Extended Data Fig. 2b). Exposure 
to maternal antibodies also suppressed the transcription of type-1 
interferon-related genes in the small intestine of ETEC-infected pups 
(Extended Data Fig. 2c). 


Milk mNabs are critical to ETEC protection 


To determine whether milk-acquired antibodies are protective, we orally 
challenged two groups of cross-fostered pups with 10° CFU of ETEC 6 
(Extended Data Fig. 3a). Thus, mNab” pups were fostered by uMT dams 
and received their antibodies (designated uMT“ -to-uMT” pups) for 
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Fig. 2| FcRn mediates postnatal IgG retro-transport. a, Breeding and 
fostering strategy to specifically study the postnatal milk IgG transfer process. 
Allpups discussed in this figure are uMT“. b, Serum IgG levels in 1-week-old 
FcRn-deficient or FcRn-sufficient uMT” pups after 1 week of fostering by a 
uMT™ dam.**P=0.0013, two-tailed Mann-Whitney U-test. Data are 
representative of two independent experiments. c, Titres in pups of IgG 
specific to the microbiota of the foster dam. Data are representative of two 
independent experiments. d, Adult (8-week-old) mice were intraperitoneally 
(i.p.) injected with 5 mg of IgG, and faeces samples were collected 1 day later. 


1 week only from milk, whereas mNab‘* pups were fostered by uMT" 
dams (that is, wMT/-to-uMT pups). IgG titres of MT" -to-uMT pups 
inthe serum and small intestine were significantly higher than those of 
LMT -to-uMT™ pups (Fig. 1g, h); the same trend was observed for IgA in 
the small intestine (Fig. 1i), and IgG and IgA in the colon (Extended Data 
Fig. 3b, c). After oral-gastric challenge of ETEC 6, uMT“ -to-uMT pups 
had a bacterial burden in the small intestine that was approximately 
30-fold lower than uMT’-to-uMT“ pups (Fig. 1j). In another cross- 
fostering experiment, pups bornto wMT” dams were divided into two 
groups and cross-fostered by uMT dams or their own uMT* dams 
(Extended Data Fig. 3d). The mNab’ pups fostered by uMT dams 
(uMT“-to-uMT pups) were resistant to ETEC 6 infection: 90% survived 
challenge at 20 h. By contrast, only 20% of mNab pups raised by their 
ownyMT“ dams survived (Extended Data Fig. 3e). This result confirms 
the importance of milk-derived mNabs in blocking ETEC 6 colonization 
and provides the mechanism by which milk-derived antibodies protect 
against ETEC 6 challenge. 


FcRn uptake of milk mNabs intothe serum 


In prenatal mice, FcRn transports IgG across the placenta to the fetus, 
primarily in the third trimester’. In adult mice, FcRn transports IgG 
from the gut lamina propria to the intestinal lumen; mice deficient in 
FcRn display impaired resistance to the enteric pathogen Citrobac- 
ter rodentium”°. Because FcRn can have a high affinity for IgG and is 
expressed neonatally on intestinal epithelial cells””””, we investigated 
whether FcRn transports IgG in the opposite direction, binding lgGin 
milk in the intestinal lumen and delivering it to the serum of suckling 
pups. We used a breeding and fostering strategy to separate the post- 
natal from prenatal antibody-transfer processes (Fig. 2a). uMT FcRn 
(also known as Fcgrt’-) mice were used to generate littermate pups 
deficient in maternal antibodies. Newborn pups were immediately 
fostered byayMT” dam for1week.uMT“ pups sufficient in FcRn had 
almost 1mg of IgG per ml of serum, whereas uMT’ FcRn pups had no 


See 


Faecal IgG levels are shown as microgram per gram of faeces. *P= 0.0357, two- 
sided Mann-Whitney U-test. Data are representative of two independent 
experiments. e, IgG treatment scheme of dams. f, Comparison of ETEC 6 
bacterial burdeninthe small intestine of pups from untreated wMT damsand 
fromuMT~ damstreated with IgG. **P= 0.0043, two-sided Mann-Whitney 
U-test. Data are representative of two independent experiments. g, Serum 
IgG levels of pups from untreated uMT” dams compared with uMT’ dams 
treated with IgG. Data are representative of two independent experiments. 
b-d,f,g, Dataare mean+s.e.m. Specific n numbers are indicated. 


detectable serum IgG (Fig. 2b). In FcRn-sufficient pups, some IgG was 
directed towards the maternal microbiota (Fig. 2c). Thus, in suckling 
mice, FcRn transports IgG from ingested milk into the serum. Although 
FcRncantransport all subclasses of milk IgG to the neonatal circulation 
(Extended Data Fig. 4a-e), the relative serum concentrations of IgG3 
and IgG2c (pup:dam ratios) are the highest and lowest, respectively 
(Extended Data Fig. 4f-j), suggesting that IgG3 is transferred preferen- 
tially and1gG2c the least efficiently. We then compared the role of FcRn 
in transferring IgG in adults versus neonates. We injected IgG (5 mg) 
intraperitoneally into 8-week-old littermates representing two groups: 
uMT™ (no antibody production) FcRn~ or, uMT* FcRn*“ (or FcRn*“*) 
mice. We sampled the faeces 1 day later. uMT/-FcRn*” (or uMT7 FcRn**) 
mice had significantly higher faecal IgG levels than uMT’ FcRn mice 
(Fig. 2d). Thus, IgG transfer from the systemic circulation to the intesti- 
nallumen primarily depends on FcRn. However, when we orally gavaged 
IgG (5 mg) into these adult mice and sampled the serum 1 day later, we 
found that both wMT’FcRn* and uMT* FcRn* (or uMT*FcRn**) mice 
had detectable but similarly low IgG levels. This experiment suggests 
that IgG transfer from lumen to serum in adult mice is poor and—in 
contrast to that in neonates—is not dependent on FcRn (Extended 
Data Fig. 4k); however, it is also possible that IgG given to adult mice 
by gavage was simply destroyed by proteolysis that does not occur in 
neonates. 

IgG binding to the microbial surface can drive immune-effector func- 
tions such as complement-dependent bacteriolysis and opsonization, 
and because flow cytometry-analysed ETEC 6 cells were coated with 
IgG but not IgA in vivo, we hypothesized that maternal natural IlgG was 
the immunoglobulin class that provided protection against ETEC. We 
synchronized the pregnancy of two uMT“ female mice mated with dif- 
ferent uMT“ male mice. On gestational day 18 and postpartum day 2, 
one female received intraperitoneal injections of IgG (12 mg) purified 
from specific-pathogen-free (SPF) wild-type mice; the other received 
injections of only PBS (Fig. 2e). At1 week of age, pups from these dams 
were challenged by the oral-gastric route with 10’ CFU of ETEC 6. Pups 


Nature | Vol577 | 23 January 2020 | 545 


o 
a 


-« SPF (n = 4) 

-+ GF (n = 4) 

-- SPF absorbed by 
microbiota (n = 4) 


-- SPF (n =6) 
= GF (n =6) 


Anti-ETEC serum ® 
total Ig (OD, 95) 
oC 
foe} 
Anti-ETEC serum & 
G (OD joe) 
oOo oO © 
yo wo fF 


0.4 3 
02 0.1 
0 0 is 
N % N % © 
FLAS PP AVA PHP PP 
Serum dilutions Serum dilutions 
c d 
0.8 0.6 
Es -~ mNab* pups (n=6)  & -~ mNabt pups (n = 6) 
5 + 0.6 -» mNab pups(n=6) 98 -+ mNab~ pups (n = 6) 
nO 0 =0.4 
oo o8 
HG 0.4 uO 
nS no 
+2 02 oD 02 
eo” = 
<x <x 
ol ot, 
bk > © & % Y > Ry 
NaN RE IP PP PP ah g PP ee 
Serum dilutions Serum dilutions 
. 1.4 u 1.2 
Bis ae -«- SPF (n= 3) - a5 -- SPF (n =3) 
5 845 = GF (n=3) Ba = GF (n= 3) 
OP ag + SPFabsorbedby % +0.8 ++ SPF absorbed by 
fu Pa 06 Pantoea (n = 4) fu © 0.6 Pantoea (n = 4) 
We 04 WG 0.4 
E202 & 0.2 
0 
& ie eS © % © 
PAP ys CHGS Pes Sev PAP PE Oe 


Serum dilutions 


Serum dilutions 


Fig. 3 | The commensal microbiota elicits antibodies that cross-react with 
ETEC6.a, Total immunoglobulin titres against ETEC 6in serum from germ-free 
(GF) and SPF adult female mice as well as in serum from SPF mice absorbed by 
mouse microbiota. Data are representative of four independent experiments. 
OD, 5, optical density at 405 nm. b, IgG titres against ETEC 6 inserum from 
germ-free and SPF mice. c, Total immunoglobulin titres against ETEC 6inserum 
from 1-week-old neonatal mNab* and mNab’ mice obtained by reciprocal 
breeding. d, IgG titres against ETEC 6 in serum from 1-week-old neonatal mNab* 
and mNab mice obtained by reciprocal breeding. e, Total immunoglobulin 
titres against ETEC 6in serum from germ-free mice, serum from SPF mice and 
Pantoea-1-absorbed serum from SPF mice. f, IgG titres against ETEC 6 inserum 
from germ-free mice, serum from SPF mice and Pantoea-1-absorbed serum from 
SPF mice. Data are mean +s.e.m. Specific n numbers are indicated in the figure. 


borntothelgG-treated uMT“ dam were highly protected and carried 
an approximately 25-fold lower small-intestinal bacterial load than pups 
bornto the untreated dam (3.8 x 10* CFU per small intestine compared 
with 9.4 x 10° CFU per small intestine; Fig. 2f). Thus IgG, inthe absence 
of IgA, provides measurable protection against ETEC 6 challenge in 
nursing pups. We measured serum IgG titres in uMT“ pups fed ona 
uMT™ dam given the same passive IgG treatment and found titres that 
were comparable to those foundin pups borntoayMT dam (Fig. 2g). 
Thus, supplementing IgG antibodies to a pregnant or postpartum dam 
is sufficient to protect the pups that she nurses from ETEC 6 infection. 


Commensal Pantoea elicits mNabs protective against ETEC 


The protection against ETEC of pups born to or nursed by mNab* dams 
suggests that conventionally colonized (SPF) mice carry cross-reacting 
natural antibodies against ETEC 6. Indeed, total ETEC-6-directed serum 
immunoglobulin and IgG titres are significantly higher in SPF mice 
thanin germ-free mice (Fig. 3a, b), suggesting that the SPF commensal 
microbiota induces antibodies that cross-react with ETEC 6 in dams. 
Absorption of SPF mouse sera with faecal bacteria completely removed 
ETEC cross-reactive antibodies (Fig. 3a). At 1 week of age, mNab* pups 
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(born to uMT” dam) had substantial serum titres of ETEC-6-specific 
total immunoglobulin and IgG antibodies—presumably cross-reacting 
antibodies generated to commensal antigens—whereas sera from 
mNab- (uMT* dam) pups contained no detectable immunoglobulin 
that bound to this strain (Fig. 3c, d). Gram-negative bacteria of the family 
Enterobacteriaceae were isolated through the culture of faeces from 
uMT™ dams. Mice from both our in-house breeding facility and The Jack- 
sonLaboratory lacked viable lactose-fermenting Gram-negative bacteria 
suchas £. coliin their faeces. We could isolate only two lactose-non- 
fermenting Gram-negative Enterobacteriaceae species—a Pantoea and 
an Enterobacter species, eachidentified by 16S rRNA gene sequencing— 
from MT” dams. We used the Pantoea | strain to absorb antibodies 
from mouse serum. Pantoea-1-absorbed SPF serum showed reduced 
titres of ETEC-reactive total immunoglobulin and IgG (Fig. 3e, f). 
These data support the hypothesis that some commensal microbiota 
species elicit cross-reactive antibodies against ETEC. 

We wondered whether mNabs in mice showed cross-reactivity 
with other common enteric bacteria, including other pathogens and 
probiotic microorganisms. We measured the titres of antibodies in 
sera of germ-free and SPF mice that recognize F. coli Nissle, ahuman 
commensal bacterial isolate that has been used as a probiotic and is 
not present inthe mouse gut, or Salmonella typhimurium, a pathogen 
in both humans and mice. Sera from SPF mice have higher total immu- 
noglobulin titres to F. coli Nissle and S. typhimurium than sera of germ- 
free mice (Extended Data Fig. 5a, b). This result suggests that some 
commensal strains of the phylum Proteobacteria induce antibodies 
that recognize other proteobacterial species and strains. 

To determine whether a single commensal species is sufficient to 
confer protection against ETEC 6 infection, we immunized germ-free 
dams either witha formalin-killed commensal Pantoea 1 strain or witha 
formalin-killed ETEC 6 strain and used unimmunized mice asa control 
group; then all three groups of pups were infected with ETEC6. We 
found that pups born to germ-free dams immunized with Pantoea1 
were significantly more protected against ETEC 6 than pups born to 
unimmunized germ-free dams (Fig. 4a-c). IgG collected from pups 
bornto Pantoea-1-immunized germ-free dams showed cross-reactivity 
to ETEC 6 (Fig. 4d) and the enteric pathogen C. rodentium (Extended 
Data Fig. 6). Furthermore, all of the commensal Enterobacteriaceae 
family isolates from mice found in three different vivariums were cross- 
reactive with the Pantoea anti-serum, but did not react with mouse or 
human commensal strains of Staphylococcus or Bacteroides. Pups of 
germ-free unimmunized dams had no detectable antibodies against 
ETEC 6 or these other bacterial species (Fig. 4e and Extended Data 
Fig. 6a, b). Western blot analysis of pronase-treated bacterial lysates 
showed elimination of a band cross-reactive with anti-Pantoea IgG, 
suggesting that this immunoreactive material is a protein (Extended 
Data Fig. 6c). We also measured IgG and IgA antibody content in milk 
samples from conventional SPF mice and found anIgG concentration 
that was approximately threefold higher than the concentration of IgA 
(Extended Data Fig. 6d); IgG titres in the milk of a given mouse dam 
were higher against the stool microbiota of the homologous dam than 
against that of aheterologous dam (Extended Data Fig. 6e). Collectively, 
these data suggest that the commensal microbiota can induce cross- 
reactive, protective antibodies against pathogens. 


Discussion 

Of the many causes of death due to bacterial pathogens among chil- 
dren under 5 years old, acute infectious diarrhoea is surpassed only by 
pneumonia’. Neonates in developing countries have frequent diar- 
rhoeal episodes that result in high mortality rates; the major infectious 
agents, which account for around 1.5 million deaths annually, are ETEC, 
rotavirus, Vibrio cholerae and Shigella“. ETEC is a frequent cause of 
diarrhoea in infants under 2 years old’®. Epidemiological dataindicate 
that breast-feeding reduces overall rates of diarrhoea and mortality”°”’; 
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Fig. 4 | Immunization of dams with commensal microorganisms conveys 
neonatal protection against pathogens. a, Survival of pups born to ETEC-6- 
or Pantoea-1-immunized dams or unimmunized dams. Data are from two 
individual experiments (first experiment, ETEC n=12 mice, unimmunized 
(non) n=7 mice, Pantoean=4 mice; second experiment, ETECn=S5 mice, 
unimmunized n=5 mice, Pantoae n=4 mice). b, Liver total bacterial burdens 

3 days after intraperitoneal ETEC 6 challenge. **P=0.0025, one-way analysis of 
variance (ANOVA) with Bonferroni post-test. Data are from two independent 
experiments. c, Spleen bacterial burdens 3 days after intraperitoneal ETEC 6 
challenge. **P=0.0041, one-way ANOVA with Bonferroni post-test. Data are 
from two independent experiments. d, Cross-reactivity against ETEC 6 of 
serum IgG from pups born to germ-free dams with or without Pantoea1 
immunization. e, Western blot showing that serum IgG of pups borntoa 
Pantoea-1-immunized dam recognizes antigens in cellular lysates of members 
of the Enterobacteriaceae family (ETEC 6, Pantoea1and Enterobacter). Lane1, 
Staphylococcus; lane 2, ETEC 6; lane 3, Pantoea1; lane 4, Enterobacter. Blot is 
detected with goat anti-mouse IgG antibody. Data are representative of three 
independent experiments. For gel source data, see Supplementary Fig. 1. 
b-d, Dataare mean+s.e.m. 


however, the underlying mechanisms by which breast milk provides 
protection are not clear. Our results suggest that breast-feeding by 
mothers wholack specific immunity to ETEC may protect infants from 
ETEC by delivering natural antibodies—which are elicited by the com- 
mensal microbiota—that cross-react with this pathogen. The data pre- 
sented here on cross-species protection by antibodies generated toa 
commensal organism are all based on mouse studies; further studies 
to address their relevance in humans are important. 

IgG present inthe breast milk of aselected dam reacted more strongly 
with the microbiota of that dam than with microbiota of other dams. 
We hypothesize that commensal species probably vary in their ability 
to induce cross-reacting antibodies that recognize any given pathogen, 
such as ETEC. Beyond antigens shared by specific bacterial groups 
(for example, lipopolysaccharides of Gram-negative bacteria), some 
antigens can be expressed by diverse and phylogenetically distant 
bacterial species, including commensal microorganisms”. Moreover, 
poly-reactive IgM can recognize both pathogenic and commensal bac- 
teriaand affords some protection against pathogen challenge in mice”. 
Thus, our study indicates that modulation of the maternal microbiotato 
optimize the induction of cross-reactive antibodies that are protective 
against important neonatal pathogens should be explored. 

The role of secretory IgA in humoral responses to enteric patho- 
gens has been widely studied*”. The function of other antibody classes 


(for example, IgG) at mucosal sites or in breast milk has attracted less 
attention, primarily because IgG is thought to be present at lower con- 
centrations and to be less stable in mucosal secretions” *. The con- 
sensus has been that secretory IgA in breast milk probably mediates 
protection**. Human and rodent milk contains substantial amounts 
of both secretory IgA and IgG****. In mice, microbiota-induced mater- 
nal IgG in milk is present in the neonatal gut mucosa and is taken 
up into the serum of breast-feeding neonates. Thus, breast-feeding 
theoretically could provide lgG-mediated protection against invasive 
pathogenic bacterial species at sites at which the effector mechanisms 
function, such as mucosal or submucosal surfaces, bloodstream or 
deeper tissues. 

One important, previously unresolved question was whether 
orally delivered IgG (acquired by neonates through the milk of the 
mother) enters the bloodstream through a specific IgG transporter. 
Previous studies detected such transport for certainimmunoglobulin 
classes” but did not clearly define the pathway for uptake. In vitro 
studies yielded evidence that FcRn recognizes IgG and transports 
it bidirectionally across an epithelial monolayer’. Our work in mice 
suggests that, dependent on FcRn, IgG in milk can enter the blood- 
stream of neonatal mice and confer potent protection—presumably 
throughIgG-dependent effector functions such as complement clas- 
sical pathway-dependent bacteriolysis or opsonization*®. We also 
uncovered an FcRn-dependent pathway for retro-transport of IgG 
(Extended Data Fig. 7) relative to secretory processes mediated by the 
polymeric immunoglobulin receptor, which transports IgA and IgM 
fromthe basolateral to the apical surface and lumen of the intestine”®. 
In the MDCK cell line, luminal-to-basolateral IgG transport report- 
edly requires antibody-antigen complexes and FcRn*°; we did not 
observe transport of IgG from lumen to serum in adult mice. Although 
FcRn is thought to function bidirectionally, we observed that, in the 
mouse, the bidirectionality may be subject to an age-dependent 
temporal sequence (that is, in neonates, from lumen to submucosa; 
in adults, from submucosa to lumen). Characterizing this transport 
pathway in humansisa future priority because vaccination of women 
may generate high-affinity IgG, protecting breast-fed neonates long 
after antibodies received through the placenta have waned from the 
bloodstream. We have not addressed whether this milk-mediated 
gastrointestinal pathway for introducing therapeutic or preventive 
IgG into the bloodstream is applicable to human neonatal infants. If 
efficient and practical, this non-invasive approach offers advantages 
over conventional passive-immunization strategies by avoiding nee- 
dle use in newborns, a practice that carries additional risk of disease 
transmission. 


Online content 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-019-1898-4. 


1. Basha, S., Surendran, N. & Pichichero, M. Immune responses in neonates. Expert Rev. Clin. 
Immunol. 10, 1171-1184 (2014). 

2. Simon, A. K., Hollander, G. A. & McMichael, A. Evolution of the immune system in humans 
from infancy to old age. Proc. R. Soc. B 282, 20143085 (2015). 

3. Kamada, N., Chen, G. Y., Inohara, N. & NUfez, G. Control of pathogens and pathobionts by 
the gut microbiota. Nat. Immunol. 14, 685-690 (2013). 

4. Carbonare, C. B., Carbonare, S. B. & Carneiro-Sampaio, M. M. S. Secretory 
immunoglobulin A obtained from pooled human colostrum and milk for oral passive 
immunization. Pediatr. Allergy Immunol. 16, 574-581 (2005). 

5. Hanson, L. A. R. & Korotkova, M. The role of breastfeeding in prevention of neonatal 
infection. Semin. Neonatol. 7, 275-281 (2002). 

6. Madoff, L. C., Michel, J. L., Gong, E. W., Rodewald, A. K. & Kasper, D. L. Protection of 
neonatal mice from group B streptococcal infection by maternal immunization with beta 
C protein. Infect. Immun. 60, 4989-4994 (1992). 

7. Zaman, K. et al. Effectiveness of maternal influenza immunization in mothers and infants. 
N. Engl. J. Med. 359, 1555-1564 (2008). 


Nature | Vol577 | 23 January 2020 | 547 


Article 


8. 


20. 


21. 


22. 


23. 


24. 


Englund, J. A. et al. Transplacental antibody transfer following maternal immunization 
with polysaccharide and conjugate Haemophilus influenzae type b vaccines. J. Infect. Dis. 
171, 99-105 (1995). 

Kearney, J. F., Patel, P., Stefanov, E. K. & King, R. G. Natural antibody repertoires: 
development and functional role in inhibiting allergic airway disease. Annu. Rev. 
Immunol. 33, 475-504 (2015). 

Macpherson, A. J., de Agiero, M. G. & Ganal-Vonarburg, S. C. How nutrition and the 
maternal microbiota shape the neonatal immune system. Nat. Rev. Immunol. 17, 508-517 
(2017). 

Chen, Y. et al. Microbial symbionts regulate the primary lg repertoire. J. Exp. Med. 215, 
1397-1415 (2018). 

Englund, J. A. et al. Maternal immunization with influenza or tetanus toxoid vaccine for 
passive antibody protection in young infants. J. Infect. Dis. 168, 647-656 (1993). 

Boes, M., Prodeus, A. P., Schmidt, T., Carroll, M. C. & Chen, J. A critical role of natural 
immunoglobulin M in immediate defense against systemic bacterial infection. J. Exp. 
Med. 188, 2381-2386 (1998). 

Ochsenbein, A. F. et al. Control of early viral and bacterial distribution and disease by 
natural antibodies. Science 286, 2156-2159 (1999). 

Baumgarth, N. et al. B-1 and B-2 cell-derived immunoglobulin M antibodies are 
nonredundant components of the protective response to influenza virus infection. 

J. Exp. Med. 192, 271-280 (2000). 

Jayasekera, J. P., Moseman, E. A. & Carroll, M. C. Natural antibody and complement 
mediate neutralization of influenza virus in the absence of prior immunity. J. Virol. 81, 
3487-3494 (2007). 

Zhou, Z. H. et al. The broad antibacterial activity of the natural antibody repertoire is due 
to polyreactive antibodies. Cell Host Microbe 1, 51-61 (2007). 

Caballero-Flores, G. et al. Maternal immunization confers protection to the offspring 
against an attaching and effacing pathogen through delivery of IgG in breast milk. Cell 
Host Microbe 25, 313-323 (2019). 

Palmeira, P., Quinello, C., Silveira-Lessa, A. L., Zago, C. A. & Carneiro-Sampaio, M. IgG 
placental transfer in healthy and pathological pregnancies. Clin. Dev. Immunol. 2012, 
985646 (2012). 

Masuda, A. et al. Fey receptor regulation of Citrobacter rodentium infection. Infect. 
Immun. 76, 1728-1737 (2008). 

Pyzik, M., Rath, T., Lencer, W. I., Baker, K. & Blumberg, R. S. FcRn: the architect behind the 
immune and nonimmune functions of IgG and albumin. J. Immunol. 194, 4595-4603 
(2015). 

Israel, E. J. et al. Expression of the neonatal Fc receptor, FcRn, on human intestinal 
epithelial cells. Immunology 92, 69-74 (1997). 

Kotloff, K. L. et al. Burden and aetiology of diarrhoeal disease in infants and young 
children in developing countries (the Global Enteric Multicenter Study, GEMS): a 
prospective, case-control study. Lancet 382, 209-222 (2013). 

Kotloff, K. L. et al. Global burden of diarrheal diseases among children in developing 
countries: incidence, etiology, and insights from new molecular diagnostic techniques. 
Vaccine 35, 6783-6789 (2017). 


548 | Nature | Vol577 | 23 January 2020 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


Kotloff, K. L. et al. The incidence, aetiology, and adverse clinical consequences of less 
severe diarrhoeal episodes among infants and children residing in low-income and 
middle-income countries: a 12-month case-control study as a follow-on to the Global 
Enteric Multicenter Study (GEMS). Lancet Glob. Health 7, e568-e584 (2019). 

Qadri, F., Svennerholm, A.-M., Faruque, A. S. G. & Sack, R. B. Enterotoxigenic Escherichia 
coli in developing countries: epidemiology, microbiology, clinical features, treatment, 
and prevention. Clin. Microbiol. Rev. 18, 465-483 (2005). 

Thapar, N. & Sanderson, I. R. Diarrhoea in children: an interface between developing and 
developed countries. Lancet 363, 641-653 (2004). 

Skurnik, D., Cywes-Bentley, C. & Pier, G. B. The exceptionally broad-based potential of 
active and passive vaccination targeting the conserved microbial surface polysaccharide 
PNAG. Expert Rev. Vaccines 15, 1041-1053 (2016). 

Le Gallou, S. et al. A splenic IgM memory subset with antibacterial specificities is 
sustained from persistent mucosal responses. J. Exp. Med. 215, 2035-2053 (2018). 
Wilmore, J. R. et al. Commensal microbes induce serum IgA responses that protect 
against polymicrobial sepsis. Cell Host Microbe 23, 302-311 (2018). 

Apter, F. M. et al. Analysis of the roles of antilipopolysaccharide and anti-cholera toxin 
immunoglobulin A (IgA) antibodies in protection against Vibrio cholerae and cholera 
toxin by use of monoclonal IgA antibodies in vivo. Infect. Immun. 61, 5279-5285 (1993). 
Michetrti, P., Mahan, M. J., Slauch, J. M., Mekalanos, J. J. & Neutra, M. R. Monoclonal 
secretory immunoglobulin A protects mice against oral challenge with the invasive 
pathogen Salmonella typhimurium. Infect. Immun. 60, 1786-1792 (1992). 

Moor, K. et al. High-avidity IgA protects the intestine by enchaining growing bacteria. 
Nature 544, 498-502 (2017). 

Stuebe, A. The risks of not breastfeeding for mothers and infants. Rev. Obstet. Gynecol. 2, 
222-231 (2009). 

Goldsmith, S. J., Dickson, J. S., Barnhart, H. M., Toledo, R. T. & Eiten-Miller, R. R. IgA, IgG, 
IgM and lactoferrin contents of human milk during early lactation and the effect of 
processing and storage. J. Food Prot. 46, 4-7 (1983). 

Fouda, G. G. et al. HIV-specific functional antibody responses in breast milk mirror those 
in plasma and are primarily mediated by IgG antibodies. J. Virol. 85, 9555-9567 (2011). 
Dickinson, B. L. et al. Bidirectional FcRn-dependent IgG transport in a polarized human 
intestinal epithelial cell line. J. Clin. Invest. 104, 903-911 (1999). 

Bournazos, S. & Ravetch, J. V. Diversification of IgG effector functions. Int. Immunol. 29, 
303-310 (2017). 

Mostov, K. E. Transepithelial transport of immunoglobulins. Annu. Rev. Immunol. 12, 
63-84 (1994). 

Yoshida, M. et al. Human neonatal Fc receptor mediates transport of IgG into luminal 
secretions for delivery of antigens to mucosal dendritic cells. Immunity 20, 769-783 
(2004). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
investigators were blinded to allocation during most of the experiments 
and outcome assessments. 


Mouse breeding strategy 
Reciprocal breeding was used to generate pups that were sufficient 
or deficient in mNabs. pMT“ pups in the mNab* group were used to 
evaluate the persistence of mNabs and MT" pupsinthemNab group 
were used to evaluate the emergence of endogenous antibodies. 
uUMT™ mice (stock no. 002288) and wild-type C57BL/6J mice (stock 
no. 000664) were purchased from The Jackson Laboratory and bred 
to generate F, MT" mice. F,4MT" female mice were bred with uMT~ 
males to generate F, progeny. F, or F;uMT” female x uMT“ male breed- 
ing and uMT“ female x uMT male breeding were synchronized to 
generate mNab* pups as wellas mNab’ pups. These pups were used for 
studies of ETEC 6 infection, in which serum and mucosal antibody levels 
were measured. FcRn” mice (Jackson Laboratory, stock no. 003982) 
anduMT* mice were used to generate the F, uMT” FcRn progeny. F, 
uMT FcRn* mice were then used to generate F, uMT’ FcRn” prog- 
eny. F, mice were used to generate uMT" FcRn* or uMT-FcRn‘“ (or 
uMTFcRn*") mice. Germ-free C57BL/6J mice were bred and main- 
tained in mouse facilities at Harvard Medical School. Germ-free mice 
were housed in standard isolators and were free of all bacteria, fungi, 
viruses and parasites; sterility was verified by regular-interval aer- 
obic and anaerobic cultures as well as PCR. All animal studies were 
approved by the IACUC of Harvard Medical School under animal proto- 
colIS:00000178-3. Mouse genotyping followed the Jackson Laboratory 
genotyping protocol for stock no. 002288 and for stock no. 003982 
(The Jackson Laboratory). 


Microbiota composition analysis 

Faecal contents were scraped off the intestines of 1-week-old pups and 
DNA was extracted with a QlAamp DNA Stool Mini Kit (Qiagen, 51604). 
The V4 region of 16S rRNA gene was amplified with paired-end16S rRNA 
gene primers 515F and 806R", and approximately 390-bp amplicons 
were purified and then subjected to multiplex sequencing (Illumina 
MiSeq, 251 nucleotides x 2 paired-end reads with 12-nucleotide index 
reads). Raw sequencing data were analysed with QIIME2 pipelines”. 
The feature table of gut microbiota was then used for alpha and beta 
diversity analysis, as well as taxonomic analysis and differential abun- 
dance testing. 


Enteric pathogen infection of neonatal mice 

In the intestinal infection model, to estimate bacterial burden, F. coli 
strain ETEC 6 (10’ CFU in 50 pl PBS buffer) was administered orally 
by gavage to 6-day-old pups using an insulin needle connected to 
polyethylene tubing (Intramedic, 4274010). The ETEC 6 strain was 
a gift from F. Qadri; genome sequence, NCBI BioSample Accession 
number SAMN12263012.) Animals were monitored closely and eutha- 
nized 20 h later, and bacterial burdens (CFU per organ) were deter- 
mined. MacConkey agar plates with specific antibiotics were used for 
the cultivation of ETEC 6. To estimate survival, ETEC 6 (10° CFU) was 
administered orally by gavage to pups, and the condition of the mice 
was closely monitored. A moribund condition was recorded as the 
experimental end point, and survival (defined as the percentage of 
animals that were alive compared to those that were moribund or dead 
20 h after challenge) was recorded. In the systemic infection model, 
ETEC 6 (10’ CFU) was administered intraperitoneally to 10-12-day-old 
pups, and the condition of the mice was closely monitored. A moribund 
condition was recorded as the experimental end point, and the survival 
at 3 days after infection was recorded. Moribund 6-7-day-old animals 
were defined as those that were grey rather than pink in colour and 


were not responsive to manual stimulation; older mice (older than 
10 days of age) were defined as moribund if they displayed abnormal 
posture, rough hair coat, exudate around eyes and/or nose, skin lesions, 
abnormal breathing, difficulty with ambulation, low food and water 
intake or self-mutilation. 


Isolation of mouse-gut commensal Enterobacteriaceae bacteria 
Homogenates of small intestine were plated on MacConkey agar plates 
without antibiotics and incubated aerobically at 37 °C overnight. Colo- 
nies were purified and DNA was extracted. The 16S rRNA gene was ampli- 
fied by PCR and sequenced with 27F and 1492R primers”. 


RNA sequencing 

Illumina sequencing libraries were built using the Ovation RNA-Seq 
System V2 (NUGEN) according to the manufacturer’s instructions, 
and were submitted to the Harvard Biopolymers Facility for sequenc- 
ing onthe Illumina NextSeq 500, resulting in 287 million high-quality 
50-nucleotide paired-end reads. Differential expression analysis was 
performed with the Bioconductor package DEseq2“. 


Total IgG and IgA enzyme-linked immunosorbent assays 

Sera and mucosal antibodies of the neonates were measured witha 
mouse IgG enzyme-linked immunosorbent assay (ELISA) kit (Abcam, 
ab157719) anda mouse IgA ELISA kit (Abcam, ab157717). Serum samples 
were diluted in the 1:20,000-1:40,000 range for IgG detection. Mucosal 
samples—from either the small intestine or the colon—were homog- 
enized in 1 ml of PBS and centrifuged. Only supernatants were used 
for ELISA. For measurement of faecal antibodies, faecal pellets were 
weighed and resuspended as 100 mg mI'stock solutions in PBS buffer 
before further dilution for ELISA. Results were read with a BioTek Syn- 
ergy HT Multi Mode Microplate Reader at OD,;,. For absorption assays, 
formalin-fixed commensal bacteria or Pantoea cells (10° CFU) were 
incubated with 100 ul sera, and after 1h bacterial cells were removed 
by centrifugation. Absorbed serum samples were diluted and used for 
ELISA as described below. 


ETEC cross-reactivity ELISA 

The cross-reactivity of mouse serum antibodies with ETEC 6 cells was 
assessed by whole-cell ELISA. In brief, ETEC 6 bacterial cells were treated 
with 0.5% formalin at room temperature for 2 hand then washed twice 
with sodium-carbonate-coating buffer. About 10° CFU per 100 pl of 
fixed ETEC 6 cells in coating buffer were added to each well of aNUNC 
Maxisorp ELISA plate (Thermo Fisher, 44-2404-21) and then incubated 
overnight at 4 °C. Wells were washed with PBST (PBS and 0.05% Tween- 
20) and blocked with 5% nonfat milk in PBST buffer for 2h at room 
temperature. Next, 2% nonfat milk in PBST was used for serum dilutions; 
the addition of 50 pl of diluted serum to each well (as replicates) was 
followed by incubation at room temperature for 1h. The following 
secondary antibodies were used at a dilution of 1:2,000: anti-mouse 
immunoglobulin-HRP (SouthernBiotech, 1010-05) or anti-mouse IgG- 
HRP (SouthernBiotech, 1030-05). Super Aqua Blue substrate (Thermo 
Fisher, 00-4203-58) was used for colour detection. Titres of antibody 
to ETEC 6 were read with a BioTek Synergy HT Multi Mode Microplate 
Reader at OD,g5. 


Construction of the ETEC-GFP strain 

The plasmid pUC18T-mini-Tn7T-Tp-gfpmut3 was electronically trans- 
formed into ETEC 6 competent cells. The successful transformant was 
selected, confirmed to be positive for GFP by PCR as well as by flow 
cytometry, and designated ETEC-GFP. 


Detection of antibody deposition on in vivo-recovered ETEC cells 
To analyse the IgG and IgA coating of ETEC 6 bacteria ex vivo, mice 
infected with ETEC-GFP were euthanized 18-20 h after infection. 
Small-intestine contents were scraped off, washed and filtered (filter 
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pore size, 5 pm; Pall Acrodisc, 4650) for recovery of bacteria. Faecal 
bacteria were resuspended in PBS with a cocktail of protease inhibi- 
tors (Roche, 11873580001) and incubated with shaking at 37 °C for 
5-10 min to facilitate GFP maturation and detection by flowcytometry. 
Faecal bacteria were blocked with 2% BSA in PBS buffer and stained with 
diluted (1:100) anti-mouse lgG-647 (Biolegend, 405322) or anti-mouse 
IgA-APC (SouthernBiotech, 1165-11). Isotype controls were Alexa Fluor 
647 goat IgG (Biolegend, 403006) and rat IgG1-APC (SouthernBio- 
tech, 0116-11). Stained bacteria were washed with PBS and analysed by 
MACsquant (Miltenyi Biotec). Data were analysed with FlowJo software 
(Tree Star). 


Exogenous antibody supplementation in MT“ pups 

The breeding of two uMT“ female mice was synchronized to generate 
two litters of pups born within a 12-h time frame. Purified SPF mouse 
IgG (12 mg; mu-003-C.05, ImmunoReagents) was injected intraperito- 
neally into one pregnant uMT“ female at gestation day 18 and again at 
postpartum day 2. The resulting two litters of “MT pups were used 
for ETEC 6 infection at 1 week of age. 


Cross-fostering experiment 

The breeding of aMT“ female witha wMT” male and the breeding 
of auMT female with a uMT“ male were synchronized to generate 
pups born onthe same day, for subsequent cross-fostering. After 1 week 
of cross-fostering, pups were used for ETEC 6 infection experiments. 
Serum and mucosal samples of infected pups were collected for meas- 
urement of IgG and IgA titres. 


Commensal immunization 

Commensal species of Enterobacteriaceae were isolated from SPF 
mice. One was identified as a Pantoea species (referred to as Pantoea1). 
This strain was grown in LB broth to an OD,oo of 1.0; cells were then 
collected by centrifugation and treated with 1% formalin for 1h before 
three washes with PBS buffer. Formalin-fixed Pantoea 1 (10’ CFU in 
100 pl of PBS) was injected intraperitoneally into mice for priming. 
After 3 weeks, 10’ CFU of formalin-fixed Pantoea 1 was again injected 
intraperitoneally as an immunological boost. Sera were collected 
from 2-week-old pups for antibody titre determination and used for 
immunoblotting analysis. 


Immunoblotting of bacterial lysates with immunized serum 

After growth of ETEC 6, Salmonella and Pantoea 1isolates in LB broth, 
bacteria (10° CFU) were collected by centrifugation, lysed with lysis 
buffer, and run on NuPAGE 4-12% Bis-Tris protein gels (Invitrogen, 
NPO335BOX) at 180 V. Separated products were transferred to nitrocel- 
lulose membranes iBlot2 NC mini stacks (Invitrogen, IB23002) with an 
iBlot transfer device (Invitrogen, IB21001). The nitrocellulose mem- 
branes were reacted with immunized mouse serumata dilution of 1:500 
and then blotted with 1:10,000 diluted IRDye680RD goat anti-mouse 


IgG secondary antibody (LI-COR, 926-68070). Images were taken with 
an Odyssey Imaging system (LI-COR Biosciences). 


Pronase treatment of bacterial lysates 

ETEC 6, Enterobacter and Pantoea 1 were grownin LB brothto an ODgoo 
of 1.0, collected, washed three times with PBS buffer and resuspended 
inhalf volume of original bacterial culture. Bacterial suspensions were 
lysed three times (15-s duration) witha Branson Ultrasonics Probe Soni- 
cator. Pronase was added to bacterial lysates to final concentrations 
of 0, 0.2, land 2 mg mI, with subsequent incubation at 42 °C for 1h. 
The digested bacterial lysates were used for immunoblotting analysis. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


16S rRNA gene profiling data and the ETEC 6 genome are available 
in the NCBI database under BioProject PRJNA577743 and BioSample 
SAMNI12263012, respectively. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Persistence and development of maternal antibodies. 


The genotypes of mating pairs are indicated under the large mouse cartoons; 
the small mouse cartoons represent neonates born to the indicated dams and 
their genotypes. Red symbols denote the presence in neonates of antibodies 
that either were acquired transplacentally from uMT* mothers or, inthe 

case of uMT™ pups, were generated endogenously after 4 weeks of age. 

a, Reciprocal breeding scheme to study maternal antibody persistence and 
development. b, Serum1IgG concentration in 1-8-week-old pups. Data are 
shownas pg ml. n=5-15 mice in each breeding group for every week of 

1-8 weeks. Further details are provided in the Source Data.c, IgA 
concentrations in small-intestine (SI) and colon (CO) homogenates from 


1-week-old pups. Data are shown as 1g per small intestine or colon. d, FaecallgA 
concentration in 2-8-week-old pups. Data are shown as pg per g of faeces. 
n=6-13 mice in each breeding group for every week of 2-8 weeks. Further 
details are provided in the Source Data. e, Serum IgM concentration in 1-week- 
old pups. Data are shownas pg mI“. f, IgG concentration in small-intestine and 
colon homogenates from 1-week-old pups. Data are shown as p1g per small 
intestine or colon. g, Faecal IgG concentration in 2-8-week-old pups. Data are 
shownas pg per g of faeces. n=5-9 mice in each breeding group for every week 
of 2-8 weeks. Further details are provided in the Source Data. b-g, Dataare 
mean +s.e.m. Specific n numbers are shown in the figure. 
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Extended Data Fig. 2| Comparison of mNAb‘ and mNab pups. a, Survival of 
2-week-old mNAb* and mNab’ pups on day 1 after intraperitoneal challenge 
with 10’ CFU of ETEC 6. mNab* group, n=6 pups; mNab’ group, n=15 pups. 

b, 16S rRNA gene analysis of the composition of the microbiotain 1-week-old 
reciprocally bred pups. Data are the average of 8 or 9 individual pups from each 
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group; mNab‘ group, n=9 pups; mNab’ group, n=8 pups.c, Transcriptome 
analysis of small intestines of ETEC-6-infected mNab* and mNab’ pups using 
RNA sequencing. n=8 mNab* pups;n=4mNab pups. Specificn numbers are 
shownin the figure. 
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Extended Data Fig. 3 | Comparison of pups from the cross-fostering 
experiment. For all panels, cross-fostering of neonates is denoted by 
horizontal arrows that provide the genotype of the pup followed by the 
genotype of the fostering dam.a, Cross-fostering experimental scheme. The 
genotypes of dams are indicated under the large mouse cartoons; the small 
mouse cartoons represent neonates that are born to those dams and have the 
same genotypeas their mothers. Thicker arrows define the mother that 
fostered the indicated neonates. Red symbols denote the presence in neonates 
of antibodies that were acquired transplacentally from their MT” mothers or, 
inthe case of uMT~ pups, fromapMT* fostering dam. b, ColonIgG 
concentration of 1-week-old cross-fostered pups. Data are shown as pig per 
colon.c, ColonIgA concentration of 1-week-old cross-fostered pups. Data are 
shownas pg per colon. d, Fostering scheme of u»MT“ pups cross-fostered bya 
uMT™ dam.e, Survival of fostered uMT“ pupsat 20-hafter ETEC infection. In 
the first experiment, n=5 uMT™ pupswere fostered by u»MT” dams; n=5uMT 
pups were fostered by uMT“ dams. Inthe second experiment, n=6 MT" pups 
were cross-fostered by uM7T” dams;n=8 MT“ pups were fostered by uMT 
dams. b,c, Data are mean +s.e.m. Specific n numbers are shown in the figure. 
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Extended Data Fig. 4 | Relative concentrations of all subclasses of IgG 
between dams and pups. a-e, SerumIgG subclass concentrations in 

UMT™ FcRn*™* (or FcRn*”) and uMT“ FcRn“ pups fostered by nMT dams for 
1week. f-j, Relative concentrations of all subclasses of IgG between dam and 


pups. k, Adult (8-week-old) mice were orally gavaged with 5 mg of IgG, and 
serum IgG concentrations were quantified as ng mI“. NS, no significant 
difference; calculated using a Mann-Whitney U-test. Specific n numbers are 
shownin the figure. 


Article 


a. b. 

19 ite) 

oO =] 

ae -= SPF (n=8) a” -= SPF (n=5) 

. + GFin=4) - GF (n=5) 

204 20, 

= E 

o 0.2 2 0.2 

B 2 

| =4 

Oo 0.0: oo. 

LP POM POY F KOCH POLY 

BN RT NT NRE BN NUR NRE 
serum dilutions * serum dilutions 


Extended Data Fig. 5 | Serum from conventionally colonized (SPF) mice 
broadly recognizes human commensal bacteria and other enteric 
pathogens. a, Total immunoglobulin titres against £. coli strain Nissle 1917 in 
germ-free and SPF mouse serum. b, Total immunoglobulin titres against 
Salmonella typhimurium in germ-free and SPF mouse serum. Specificn 
numbers are shown in the figure. 
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Extended Data Fig. 6 | Characteristics of commensal-immunized and 
unimmunized serum. a, Western blot analysis of serum IgG from pups bornto 
Pantoea-1-immunized SPF mice shows epitopes of ETEC 6, Pantoealand 
Enterobacter, similar in size to those in serum from pups born to Pantoea-1- 
immunized germ-free mice. b, Serum IgG of pups born to Pantoea-immunized 
germ-free dams cross-reacts with different Enterobacteriaceae isolates from 
different facilities. 1, Harvard SGM Pantoea; 2, ETEC 6; 3, Harvard SGM 
Enterobacter; 4, Bacteroides fragilis NCTC9343; 5, Charles River B6 Proteus 
mirabilis; 6, Charles River B6E. coliisolate 1; 7, Charles River B6 E. coliisolate 2; 


8, Charles River CD1E. coliisolate1; 9, Charles River CD1E. coliisolate 2; 10, 
Taconic B6F. coliisolate 1; 11, Taconic B6£. coliisolate 2; 12, Taconic B6 Proteus 
isolate; 13, Taconic B6 Enterobacter isolate; 14, C. rodentium.c, Pronase-treated 
bacterial lysates blotted with serum IgG of pups born to Pantoea-immunized 
germ-free dams. The concentrations of pronase are specified in the figure. 

a-c, Proteins were detected using a goat anti-mouse IgG antibody. For gel 
source data, see Supplementary Fig. 1.d, Mouse milk IgG and IgA concentrations. 
Data are showninug mI". e, Mouse milk IgG titre against microbiota. Each line 
represents an independent mouse. d, Dataare mean +s.e.m. 
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Extended Data Fig. 7 | Schematic summary of the findings in this study. Top, 
mNabs induced by commensal microbiota in dams were transferred to 
neonates through the breast milk. Cross-reacting mNabs (especially IgG 
antibodies) were detected that bound to the pathogenic, non-indigenous 
bacterial species ETEC and correlated with protection against disease in pups 
challenged with ETEC. IgG antibodies were also shown to be transported from 
the milk to the bloodstream of pups bya process that we call IgG retro- 
transport. Bottom, mNabs react with many commensal species and among 
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them an Enterobacteriaceae isolate (Pantoea) was found to induce antibodies 
that cross-react with ETEC. The immunogenicity of this commensal species is 
hypothesized to bea result of local antigen-sampling processes that involve 
dendritic cells and uptake by Peyer’s patch germinal centres. This ultimately 
leads to the induction of high-affinity IgGs directed against a Pantoea antigen 
that cross-reacts with ETEC. IgG was also shown to be transported from the 
blood stream to the intestinal lumen by FcRnin adult mice. Illustrations were 
created with BioRender (https://biorender.com/). 
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Data collection No data collection software was used 


Data analysis 1. GraphPad Prism 8 was used to analyze bacterial burden, mouse survival, antibody titer absorption assay, ELISA assays results. 
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3. RNAseq data differential expression analysis was performed using Bio-conductor package DEseq2. 
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Sample size Statistical methods were not used to determine sample size. It's impossible to predict the magnitude of experimental variation between 
animals based on our current knowledge in this exploratory study. The group sizes (at least three animals per treatment group) represents the 
minimum number animals needed to reach statistical significance (p < 0.05) between experimental groups. 
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Data exclusions No data was excluded. 


Replication Most experiments were repeated between 2-6 times and each experiment involved at least 3 mice. Experiments results were robust and 
reproducible. 


Randomization Due to the nature of the experiment design, randomization of animal was not relevant to our study. 


Blinding The investigators were blinded to group allocation during data collection and analysis, because one investigator collected mice bacterial 
burden or survival, and the other person genotyping of all the mice. Two persons put the results together after they finish each analysis. 


Behavioural & social sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g. qualitative cross-sectional, 
quantitative experimental, mixed-methods case study). 


Research sample State the research sample (e.g. Harvard university undergraduates, villagers in rural India) and provide relevant demographic information 
(e.g. age, sex) and indicate whether the sample is representative. Provide a rationale for the study sample chosen. For studies involving 
existing datasets, please describe the dataset and source. 


Sampling strategy Describe the sampling procedure (e.g. random, snowball, stratified, convenience). Describe the statistical methods that were used to 
predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale 
for why these sample sizes are sufficient. For qualitative data, please indicate whether data saturation was considered, and what criteria 
were used to decide that no further sampling was needed. 


Data collection Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, 
computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether 
the researcher was blind to experimental condition and/or the study hypothesis during data collection. 


Timing Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each sample cohort. 


Data exclusions If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale 
behind them, indicating whether exclusion criteria were pre-established. 


Non-participation State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no 
participants dropped out/declined participation. 


Randomization If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if 
allocation was not random, describe how covariates were controlled. 


Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 
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Study description Briefly describe the study. For quantitative data include treatment factors and interactions, design structure (e.g. factorial, nested, 
hierarchical), nature and number of experimental units and replicates. 


Research sample Describe the research sample (e.g. a group of tagged Passer domesticus, all Stenocereus thurberi within Organ Pipe Cactus National 
Monument), and provide a rationale for the sample choice. When relevant, describe the organism taxa, source, sex, age range and 


any manipulations. State what population the sample is meant to represent when applicable. For studies involving existing datasets, 
describe the data and its source. 


Sampling strategy Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size 
calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient. 


Data collection Describe the data collection procedure, including who recorded the data and how. 


Timing and spatial scale | Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for 
these choices. If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which 
the data are taken 


Data exclusions If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, 
indicating whether exclusion criteria were pre-established. 


Reproducibility Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to 
repeat the experiment failed OR state that all attempts to repeat the experiment were successful. 
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Randomization Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were 
controlled. If this is not relevant to your study, explain why. 


Blinding Describe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR explain why 
blinding was not relevant to your study. 


Did the study involve field work? [| Yes [| No 


Field work, collection and transport 


Field conditions Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall). 
Location State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water 
depth). 


Access and import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and 
in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing 
authority, the date of issue, and any identifying information). 


Disturbance Describe any disturbance caused by the study and how it was minimized. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used The following antibodies were used: Name / Clone / Cat# / Dilution / Manufacture 
Goat-anti-mouse-lg-HRP / Polyclonal / Cat.1010-05 / 1:2000 / Southern Biotech 
Goat anti-mouse-IgG-HRP / Polyclonal / Cat.1030-05 / 1:2000 / Southern Biotech 
AlexaFluor@647Goat anti-mouse IgG / Poly4053 / Cat. 405322 / 1:100 / Biolegend 
AlexaFluor@647 Goat IgG / Poly24030 / Cat. 403006 / 1:100 / Biolegend 
Rat anti-mouse IgA-APC / 11-44-2 / Cat. 1165-11 / 1:100 / Southern Biotech 
Rat IgG1-APC / KLH/G1-2-2 / Cat. 0116-11 / 1:100 / Southern Biotech 
IRDye680RD goat anti-mouse IgG secondary antibody / Cat. 926-68070 / 1:10000 / LI-COR 
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The following ELISA kits and antibody were used: 
Mouse IgG ELISA kit (Abcam Cat. ab157719) 
Mouse IgA ELISA kit (Abcam Cat. ab157717) 


ouse IgM ELISA Kit (Abcam Cat. ab133047) 

ouse IgG1 ELISA Kit (Abcam Cat. ab133045) 

ouse IgG2a ELISA Kit (Abcam, Cat. ab133046) 

ouse IgG2b ELISA Kit (Abcam, Cat. ab136941) 

ouse IgG2c ELISA Kit (Abcam, Cat. ab157720) 

ouse IgG3 ELISA Kit (Abcam, Cat. ab157721) 

Purified SPF mouse IgG / Cat. mu-003-C.05 / ImmunoReagents 


Validation All antibodies and ELISA kits were commercially available in Abcam, Southern Biotech, Biolegend, ImmunoReagents or LI-COR. 
The validation data is available on their websites. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) State the source of each cell line used. 
Authentication Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated. 
Mycoplasma contamination Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for 


mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination. 


Commonly misidentified lines ame any commonly misidentified cell lines used in the study and provide a rationale for their use. 
(See ICLAC register) 


Palaeontology 


Specimen provenance Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the 
issuing authority, the date of issue, and any identifying information). 


Specimen deposition Indicate where the specimens have been deposited to permit free access by other researchers. 
Dating methods If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), 


where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new 
dates are provided. 


Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals us Musculus, C57BL/6, neonates and pups up to 8 week old. uMT-/- mice (Stock Number: 002288), FcRn-/- mice (Stock 
number: 003982) and WT C57BL/6J mice (Stock number: 000664) were purchased from Jackson Laboratory and maintained at 
Harvard Medical School Seely Mudd Animal facility. CD1 mice (strain code: 022) and C57BL/6 mice (strain code: 027) were 
purchased from Charles river and maintained at Harvard Medical School Seely Mudd Animal facility. CS7BL/6 mice 
nomenclature: C57BL/6NTac) were purchased from Taconic and maintained at Harvard Medical School Seely Mudd Animal 


facility. 
Wild animals The study did not involve wild animals. 
Field-collected samples The study did not involve samples collected from the field. 
Ethics oversight All animal studies were approved by IACUC of Harvard Medical School under the animal protocol 1S:00000178-3. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics Describe the covariate-relevant population characteristics of the human research participants (e.g. age, gender, genotypic 
information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study design 


Ne 


questions and have nothing to add here, write "See above." 


Recruitment Describe how participants were recruited. Outline any potential self-selection bias or other biases that may be present and how 
these are likely to impact results. 


Ethics oversight Identify the organization(s) that approved the study protocol. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Clinical data 


Policy information about clinical studies 
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions. 


Clinical trial registration Provide the trial registration number from ClinicalTrials.gov or an equivalent agency. 

Study protocol Note where the full trial protocol can be accessed OR if not available, explain why. 

Data collection Describe the settings and locales of data collection, noting the time periods of recruitment and data collection. 
Outcomes Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures. 


ChIP-seq 
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Data deposition 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, 

May remain private before publication. provide a link to the deposited data. 

Files in database submission Provide a list of all files available in the database submission. 

Genome browser session Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to 

(e.g. UCSC) enable peer review. Write "no longer applicable" for "Final submission" documents. 

Methodology 

Replicates Describe the experimental replicates, specifying number, type and replicate agreement. 

Sequencing depth Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of 
reads and whether they were paired- or single-end. 

Antibodies Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone 
name, and lot number. 

Peak calling parameters Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and 
index files used. 

Data quality Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold 
enrichment. 

Software Describe the software used to collect and analyze the ChIP-seq data. For custom code that has been deposited into a 


community repository, provide accession details. 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group’ is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


|__| A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation Mouse small intestine contents were scraped, washed and filtered through 5um filter (Pall acrodisc Cat. 4650) to recover 
bacteria. Fecal bacteria were resuspended in PBS with a cocktail of protease inhibitors (Roche Cat. 11873580001) and incubated 
with shaking at 370C incubator for 5-10 mins to facilitate GFP protein maturation and detection on flow cytometry. Fecal 
bacteria were blocked with 2% BSA in PBS buffer and stained with 1:100 diluted anti-mouse IlgG-647 (Biolegend Cat. 405322) and 
anti-mouse IgA-647 respectively. Stained bacteria were washed with PBS and analyzed by MACsquant (Miltenyi Biotec). Data 
were analyzed by flow Jo software (V10.6.0) 


Instrument MACsquant (Miltenyi Biotec) 
Software Data were analyzed by flow Jo software (Tree Star) 


Cell population abundance Describe the abundance of the relevant cell populations within post-sort fractions, providing details on the purity of the samples 
and how it was determined. 


Gating strategy Describe the gating strategy used for all relevant experiments, specifying the preliminary FSC/SSC gates of the starting cell 
population, indicating where boundaries between "positive" and "negative" staining cell populations are defined. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 


Magnetic resonance imaging 


Experimental design 
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Design type Indicate task or resting state; event-related or block design. 


Design specifications Specify the number of blocks, trials or experimental units per session and/or subject, and specify the length of each trial 
or block (if trials are blocked) and interval between trials. 


Behavioral performance measures _| State number and/or type of variables recorded (e.g. correct button press, response time) and what statistics were used 
to establish that the subjects were performing the task as expected (e.g. mean, range, and/or standard deviation across 


subjects). 
Acquisition 

Imaging type(s) Specify: functional, structural, diffusion, perfusion. 

Field strength Specify in Tesla 

Sequence & imaging parameters Specify the pulse sequence type (gradient echo, spin echo, etc.), imaging type (EPI, spiral, etc.), field of view, matrix size, 
slice thickness, orientation and TE/TR/flip angle. 

Area of acquisition State whether a whole brain scan was used OR define the area of acquisition, describing how the region was determined. 

Diffusion MRI Used Not used 


Preprocessing 


Preprocessing software Provide detail on software version and revision number and on specific parameters (model/functions, brain extraction, 
segmentation, smoothing kernel size, etc.). 


Normalization If data were normalized/standardized, describe the approach(es): specify linear or non-linear and define image types 
used for transformation OR indicate that data were not normalized and explain rationale for lack of normalization. 


Normalization template Describe the template used for normalization/transformation, specifying subject space or group standardized space (e.g. 
original Talairach, MNI305, ICBM152) OR indicate that the data were not normalized. 


Noise and artifact removal Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and 
physiological signals (heart rate, respiration). 


Volume censoring Define your software and/or method and criteria for volume censoring, and state the extent of such censoring. 


Statistical modeling & inference 


Model type and settings Specify type (mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first 
and second levels (e.g. fixed, random or mixed effects; drift or auto-correlation). 


Effect(s) tested Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether 
ANOVA or factorial designs were used. 


Specify type of analysis: [ ]Whole brain [| ROl-based [_] Both 


Statistic type for inference Specify voxel-wise or cluster-wise and report all relevant parameters for cluster-wise methods. 
(See Eklund et al. 2016) 
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Correction Describe the type of correction and how it is obtained for multiple comparisons (e.g. FWE, FDR, permutation or Monte 
Carlo). 


Models & analysis 


n/a | Involved in the study 


Functional and/or effective connectivity 


Graph analysis 


[ ] Multivariate modeling or predictive analysis 


Functional and/or effective connectivity Report the measures of dependence used and the model details (e.g. Pearson correlation, partial 
correlation, mutual information). 


Graph analysis Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph, 
subject- or group-level, and the global and/or node summaries used (e.g. clustering coefficient, efficiency, 
etc.). 


Multivariate modeling and predictive analysis Specify independent variables, features extraction and dimension reduction, model, training and evaluation 
metrics. 
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Treatment with immune checkpoint blockade (ICB) has revolutionized cancer 


therapy. Until now, predictive biomarkers 


11° and strategies to augment clinical 


response have largely focused on the T cell compartment. However, other immune 


subsets may also contribute to anti-tumour immunity 


1-15 although these have been 


less well-studied in ICB treatment’®. A previously conducted neoadjuvant ICB trial in 
patients with melanoma showed via targeted expression profiling” that B cell 
signatures were enriched in the tumours of patients who respond to treatment versus 
non-responding patients. To build on this, here we performed bulk RNA sequencing 
and found that B cell markers were the most differentially expressed genes in the 
tumours of responders versus non-responders. Our findings were corroborated using 
a computational method (MCP-counter’’) to estimate the immune and stromal 
composition in this and two other ICB-treated cohorts (patients with melanoma and 
renal cell carcinoma). Histological evaluation highlighted the localization of B cells 
within tertiary lymphoid structures. We assessed the potential functional 
contributions of B cells via bulk and single-cell RNA sequencing, which demonstrate 
clonal expansion and unique functional states of B cells in responders. Mass 
cytometry showed that switched memory B cells were enriched in the tumours of 
responders. Together, these data provide insights into the potential role of B cells and 
tertiary lymphoid structures in the response to ICB treatment, with implications for 
the development of biomarkers and therapeutic targets. 


Immunotherapy has afforded patients with melanoma and other 
cancers the potential for long-term survival, and we are beginning 
to gain insight into the mechanisms of therapeutic responses as well 
as biomarkers of response and resistance. Considerable progress has 
been made in this regard, with the identification of several validated 
biomarkers, particularly for ICB therapy’. It is clear that cytotoxic 
T cells have a dominant role in responses to ICB and other forms of 
immunotherapy; however, there is a growing appreciation of other 
components of the tumour microenvironment that may influence 
the therapeutic response—including myeloid cells and other subsets 
of immune cells”. 


The list of affiliations appears at the end of the paper. 


Tumour-infiltrating B cells have been identified, but their overall 
functional role in cancer is incompletely understood*>? *—some 
studies suggest that they are tumour-promoting, whereas others show 
a positive association with improved cancer outcomes, particularly 
when they are found in association with organized lymphoid aggregates 
known as tertiary lymphoid structures (TLSs)?365- 28, 

TLSs have been identified within a wide range of human cancers 
at all stages of disease, in primary as well as metastatic lesions, but 
their presence is highly variable between cancer types as well as 
between patients”?°. Considerable heterogeneity also exists in the 
cellular constituents of TLSs and their location within tumours, and 
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this may influence the overall effect on anti-tumour immunity and 
outcome” ™*”°, These TLS structures are not only a surrogate marker 
of a brisk immune response; instead, it is thought that they actively 
modulate anti-tumour immune activity. In this regard, the benefit of 
a high CD8* T cell density within a tumour is abrogated in the absence 
of TLS-associated dendritic cells”. Mature TLSs exhibit evidence for 
the formation of germinal centres*°”, and oligoclonal B cell responses 
have previously been identified in cutaneous melanoma and metasta- 
ses3, which suggests an active humoral anti-tumour response within 
TLSs that is driven by B cells. Notably, although preliminary evidence 
suggests an association between responses to ICB and the presence of 
Bcells, the precise role of B cells—and in particular TLSs—in response 
toICB remains unclear****. 

Aphase 2 clinical trial of neoadjuvant treatment with ICB in patients 
with high-risk resectable (clinical stage III or oligometastatic stage IV) 
melanoma was recently conducted to assess the safety and feasibility 
of this treatment in this patient population (NCT02519322)”. Notably, 
longitudinal tumour samples were taken in the context of therapy, and 
molecular and immune profiling was performed to gain insight into 
the mechanisms of the therapeutic response and resistance. In these 
studies, known and novel biomarkers of response were identified, and 
targeted protein expression profiling (via Nanostring Digital Spatial 
Profiling) revealed significantly higher expression of B cell markers 
in samples before treatment (baseline) and on-treatment samples of 
responders to ICB”. 


Bcells found in the tumours of responders 


To gain a deeper understanding of potential mechanisms of thera- 
peutic response to ICB, we performed RNA sequencing (RNA-seq) in 
longitudinal tumour samples from this patient cohort. In these studies, 
significantly higher expression of B-cell-related genes such as MZB1, 
JCHAIN and IGLLS was observed in patients that respond to ICB treat- 
ment versus non-responding patients (‘responders’ and ‘non-respond- 
ers’, hereafter) at baseline (P< 0.001) with over-representation of these 
genes compared to T cells and other immune markers (with evaluable 
tumours from seven responders and nine non-responders) (Fig. 1a, b, 
Supplementary Tables 1, 2). Other genes that are expected to alter the 
function of B cells were also significantly enriched in responders versus 
non-responders, such as FCRLS, /DO1, IFNG and BTLA. Low tumour 
purity was observed in some samples, particularly in the context of 
an effective therapeutic response, limiting conventional analysis of 
RNA-seq data. To address this, we next performed a more focused 
investigation of the tumour immune microenvironment using the 
microenvironment cell populations (MCP)-counter method’’ on RNA- 
seq data in baseline and on-treatment tumour samples—focusing more 
specifically onimmune-related genes (Supplementary Table 3), which 
allowed inclusion of samples with low tumour purity (10 responders 
and 11 non-responders at baseline, 9 responders and 11 non-responders 
on-treatment). In these analyses, we again observed enrichment of 
aB cell signature in responders versus non-responders at baseline 
and early on-treatment (P= 0.036 and 0.038, respectively). Notably, 
these analyses included samples from patients with nodal and extra- 
nodal disease with no obvious contribution based on the site of disease 
(Fig. 1c, Extended Data Figs. la, b, 2a, Supplementary Tables 4, 13), 
which suggests that B cell signatures were not merely related to the 
presence of these tumours within lymph nodes. Findings of high B 
cell lineage scores in responders were replicated in samples from an 
additional cohort of patients with melanoma treated with neoadjuvant 
versus adjuvant checkpoint blockade (ClinicalTrials.gov identifier 
NCT02437279, OpACIN-neo trial) (n = 12 responders, 6 non-respond- 
ers) (Extended Data Figs. 1d, 2c, Supplementary Tables 5, 6, 13). B cell 
signatures alone were predictive of response in univariable analyses 
(odds ratio 2.6, P= 0.02 for our trial, and odds ratio 2.9, P= 0.03 for 
combined melanoma cohorts), but notin multivariable analyses when 
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considering other components of the immune cell infiltrate, which 
suggests that B cells probably act together with other immune subsets 
and are not acting in isolation; however, these analyses were limited 
owing to the low sample size (Supplementary Tables 7, 8). Moreover, 
these findings were corroborated in translational studies of separate 
cohorts of patients with melanoma” and sarcoma” who were treated 
with ICB. B cells were not significantly associated with pathological 
response rates in an analogous trial of neoadjuvant-targeted therapy 
in patients with BRAF-mutated melanoma’*® (Extended Data Fig. le, 
Supplementary Table 9); however, B cells have previously been shown 
to be positively associated with responses to chemotherapy in other 
cancer types?**?, 


Similar B cell signature observed in RCC 


To evaluate the validity of these findings across other cancer types, 
we next assessed the expression of these immune cell gene expres- 
sion signatures in a pre-surgical ICB trial for patients with metastatic 
renal cell carcinoma (RCC) (NCT0O2210117, PD1 blockade monother- 
apy versus combined CTLA4 and PD1 blockade versus combined PD1 
blockade and bevacizumab) (Supplementary Table 10). Gene expres- 
sion profiling by microarray and subsequent MCP-counter analysis 
of baseline tumour samples was performed, demonstrating signifi- 
cantly higher expression of B-cell-related genes in responders versus 
non-responders (P= 0.0011, n=17 responders and 11 non-responders) 
(Fig. 1d, Extended Data Figs. 1c, 2b, 3, Supplementary Tables 11-13). As 
inthe case of melanoma, B cell signatures were predictive of aresponse 
in univariable analysis in the RCC cohort (odds ratio 61.2, P=0.05) but 
not multivariable analysis, again suggesting cooperative function with 
other immune subsets; however, sample size was again limited (Sup- 
plementary Table 14). 


Bcells prognostic in TCGA analysis 


On the basis of these data and existing data regarding a potential 
prognostic role for TLSs in melanoma and other cancer types pri- 
marily outside the context of ICB treatment'®”*“", we next assessed 
the expression of these immune-related genes in cutaneous mela- 
noma from The Cancer Genome Atlas (TCGA) platform (TCGA-SKCM, 
n=136)*. To this end, we applied the MCP-counter algorithm to 
available RNA-seq data from a subset of patients with non-recurrent 
stage III disease (regional lymph node or regional subcutaneous 
metastases), as these were most comparable to our clinical cohort. In 
these studies, we identified three distinct melanoma immune classes 
(MICs), with significantly higher expression of B cells in cluster C 
than in cluster A (P< 0.0001) or cluster B (P< 0.0001) (Extended 
Data Fig. 4a, Supplementary Tables 15-17). Importantly, there was no 
clear association of MICs with known genomic subtypes of melanoma 
(BRAF, NRAS, NF1or triple wild type)” or disease site (nodal or non- 
nodal) (Extended Data Fig. 4a, Supplementary Table 17). Survival 
analyses revealed that cases in cluster C had significantly improved 
overall survival compared with cluster A (P= 0.0068) (Extended Data 
Fig. 4b). To assess the association with B cell signatures specifically, 
we next compared overall survival in patients with tumours high for 
Bcell lineage versus low, which demonstrated prolonged survival in 
patients with B cell-lineage-high tumours (P= 0.053) (Extended Data 
Fig. 4c). Furthermore, univariable Cox proportional hazards model- 
ling demonstrated that tumours with low infiltration of B cells had 
significantly increased risk of death (hazard ratio is 1.7 for B-cell-low, 
P=0.05) in comparison to the B-cell-high group (Supplementary 
Table 18). These data are further supported by recent analyses of the 
TCGA cohort that demonstrate the association of a plasmablast-like 
Bcell signature with survival as well as increased expression of CD8A 
and infiltration of CD8* T cells**. Similar analyses were performed 
to assess the expression of immune-related genes in clear-cell RCC 


|PDLIM3 


Fig. 1| Transcriptional analysis of tumour specimens from patients with 
high-risk resectable melanoma and metastatic RCC treated with pre- 
surgical ICB.a, Supervised hierarchical clustering of differentially expressed 
genes (DEG) onRNA-seq analysis by response of melanoma tumour specimens 
at baseline, with responder defined as having a complete or partial response by 
RECIST 1.1and non-responder as having less than partial response (n=9 non- 
responders and 7 responders). A cut-off of gene expression fold change of >2 
or<0.5anda false discovery rate (FDR) g< 0.05 was applied to select DEGs. Ipi, 
ipilimumab; nivo, nivolumab. b, Volcano plot depiction of DEG by response 


from the TCGA (TCGA-KIRC, n = 526)**. In these analyses, similar 
immune classes were observed; however, immune infiltration was 
not associated with survival in these patients (P = 0.24) (Extended 
Data Fig. 4d-f, Supplementary Tables 19-21), possibly owing to the 
heterogeneous nature of this disease and other driving mechanisms 
of patient outcomes. 


Bcells localized in the context of TLSs 


On the basis of the results from gene expression profiling, we next 
assessed tumour samples histologically to gain insight into the den- 
sity and distribution of B cells as well as their relationship to TLSs 
in patients treated with neoadjuvant ICB. The density of CD20* B 
cells and TLSs, and the ratio of TLSs to tumour area were higher in 
responders than in non-responders in our neoadjuvant melanoma 
cohort, particularly in early on-treatment samples (P = 0.0008, 
P=0.001and P=0.002, respectively), although statistical significance 
was not reached for all the markers in the baseline samples (P= 0.132, 
P=0.078 and P= 0.037, respectively) (Fig. 2a), whichis consistent with 
previous work that suggested that assessment of early on-treatment 
immune infiltrate is far more predictive of the response to ICB than 
assessment of pre-treatment samples’. Findings between gene expres- 
sion profiling and immunohistochemistry analysis were complemen- 
tary, and had modest correlation as previously described’ (Extended 
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from same cohort as ina. R, responders; NR, non-responders. c, Supervised 
clustering of melanoma tumour specimens by response at baseline (n= 11non- 
responders and 10 responders), displaying MCP-counter scores. NK cells, 
natural killer cells. d, Supervised clustering by clinical response defined as 
achieving a partial response (PR) according to RECIST 1.1and non-responders 
as having progressive disease (PD) of RCC baseline tumour specimens (n=11PD 
and 17 PR) using methodology as inc. Pvalues were determined by two-sided 
Mann-Whitney U-test. Bev, bevacizumab. 


Data Fig. 5c—e). We also found increased numbers of B-cell-related 
exosomes (CD20°) in the peripheral blood of responders compared 
with non-responders at early on-treatment time points (Extended 
Data Fig. 2d-j). 

Notably, architectural analysis showed that CD20*B cells were local- 
ized in TLSs of tumours of responders, and were colocalized with CD4", 
CD8*and FOXP3* T cells. Colocalization with CD21‘ follicular dendritic 
cells and MECA79 high endothelial venules was also shown (Fig. 2d-f, 
Extended Data Figs. 5a, 6a). The vast majority of evaluated TLSs in these 
patients represented mature secondary-follicle-like TLSs, as indicated 
by the presence of both CD21 follicular dendritic cells and CD23* ger- 
minal centre B cells*° (Fig. 2d-f, Extended Data Figs. 5a, 6a). We identify 
similar mature TLSs in patients with extra-nodal metastases (Extended 
Data Fig. 5b), which suggests that TLSs may develop in non-nodal sites 
and are associated with the response to ICB treatment. Analogous 
immunohistochemical findings were observed in our cohort of patients 
with RCC treated with pre-surgical ICB, with increased infiltration of 
CD20* cells and TLSs density associated with response to treatment 
(Extended Data Fig. 6b-d); these TLSs are morphologically similar to 
those found in melanoma (Extended Data Fig. 6e-h). We also assessed 
the potential functional role of B cells and TLSs in promoting T cell 
responses in our cohort via additional spatial profiling analyses, and 
found increased markers of activation on T cells within as compared 
to those outside these TLSs (Extended Data Fig. 7a-c). 
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Fig.2|TLSs containing B cells, T cells and follicular dendritic cells are 
predictive of response toICB. a, Quantification of CD20 cells by singlet 
immunohistochemistry and association with response to neoadjuvant ICBin 
resectable melanoma with responders defined as having complete or partial 
response by RECIST 1.1and non-responders as having less than a partial 
response (n=11NRand 10 Rat baseline andn=11NRand 9 Rontreatment). 
b,c, Density of TLSs (b) and ratio of tumour area occupied by TLSs (c) and 
correlation by treatment response (n=7 NRand 7 Rat baseline andn=10 NR 
and 8 Rafter treatment). For a-c, bars indicate median values, and errors bars 


BCR and single-cell RNA-seq offer functional insight 

Next, we performed several in-depth analyses to gain insight into the 
phenotype and function of the infiltrating B cells, and howthey might be 
contributing to responses toICB. Reasoning that differences inthe clono- 
types of Bcell receptors (BCRs) between responders and non-responders 
would be indicative of an anti-tumour B cell response, we probed our 
RNA-seq data for BCR sequences using the modified TRUST algorithm. 
In these studies, we identified significantly increased clonal counts for 
both immunoglobulin heavy and light chains (IgH and IgL; P=0.001 
and P=0.004, respectively) and increased BCR diversity in responders 
than in non-responders (P= 0.002 and P= 0.0008), which suggests an 
active role for B cells in anti-tumour immunity (Fig. 3a, Extended Data 
Fig. 8). Tocomplement these analyses, we analysed single-cell RNA-seq 
data from baseline and on-treatment samples from an independent 
cohort of patients with metastatic melanoma treated with ICB (n=48 
tumour samples; 1,760 B cells from 32 patients treated with PD1 blockade 
monotherapy, CTLA4 blockade monotherapy, or combined blockade 
of both PD1and CTLA4, including samples from some patients in our 
neoadjuvant ICB-treated cohort) (Supplementary Tables 22, 31). Similar 
to observations made in our clinical trial cohort, we found that B cells 
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denote interquartile range; individual data points are shown. Pvalues were 
determined by two-sided Mann-Whitney U-test. d, Representative image of 
CD20 staining of TLSsina responder after treatment with ipilimumab and 
nivolumab. e, Additional staining of boxed area in d showing associated 
haematoxylin and eosin (H&E) staining and singlet immunostaining of CD20, 
CD8, CD4, FOXP3 and CD21. f, Multiplex immunofluorescence assay of TLSsas 
ind for the following markers: CD20, CD21, CD4, CD8, FOXP3 and DAPI. Original 
magnification, x20. 


were significantly enriched in tumours from responders versus non- 
responders and were predictive of a response (odds ratio 1.05, P=0.02) 
(Fig. 3b, Extended Data Fig. 9a, Supplementary Table 23). Unbiased analy- 
sis for markers of B cells (using all expressed genes in the CD45*CD19* 
population only) associated with clinical outcome demonstrated 46 
markers were significantly enriched in lesions from responders and 147 
markers were significantly enriched in non-responder lesions (Extended 
Data Fig. 9b, Supplementary Tables 24, 25). Pathways upregulated in 
responders as compared to non-responders include those consistent 
withincreasedimmune activity such as CXCR4 signalling, cytokine recep- 
tor interaction and chemokine signalling pathways (Supplementary 
Table 26). Unsupervised clustering of B cells using k-means clustering, 
after testing for the robustness of each solution, identified four distinct 
Bcell clusters, G1 (B cells, switched, activated IgD' cells), G2 (plasma 
cells), G3 (B cells unswitched IgD*) and G4 (B cells, switched, activated 
IgD cells, with unique markers relative to G1), each of which is asso- 
ciated with different functional states (Fig. 3c, Extended Data Fig. 9c, 
Supplementary Tables 27, 28). No significant differences were identified 
when testing for associations of each individual cluster (G1-G4) with 
the clinical outcome, probably owing to limited sample size. Pathway 
analysis was also performed on bulk RNA-seq data from our clinical trial 
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Fig.3| Analyses of B-cell receptor clones and single-cell analyses suggest 
active role for B cells in anti-tumour immunity. a, Normalized clonal counts 
for BCRs identified in patients with high-risk resectable melanomatreated with 
neoadjuvant ICB. Both the IgH and IgL are evaluated with responders and non- 
respondersas shown. All samples analysed at baseline. b, Scatter plots 
demonstrating the percentage of various cell types as indicated between 
responders (n=17) and non-responders (n= 31) froma separate cohort of 
patients with advanced melanoma analysed by single-cell RNA-seq. Samples 
before and after treatment are combined. B cells are represented by the 
CD45*CD19* population. Data are median values with interquartile ranges, and 
individual data points are shown. P values were determined by two-sided 
Mann-Whitney U-test. Adjustments for multiple comparisons were not made. 
c, t-distributed stochastic neighbour embedding (¢-SNE) plot of all B cells 
collected and analysed by single-cell RNA-seqin b. Cells are coloured based on 


cohort, revealing increased immune signalling pathways in responders 
thaninnon-responders, including T cell receptor signalling, major histo- 
compatibility complex-mediated antigen presentation and processing, 
differentiation of T helper 1and 2 (T,,land T,,2) cells, and costimulatory 
signalling associated with T cell signalling (Supplementary Tables 29, 30). 


CyTOF shows differential B cell phenotypes 

To gain further insight into the potential functional role of B cells inthe 
response to ICB, we performed mass cytometry (CyTOF) in evaluable 
tumour and peripheral blood samples (seven responders and three non- 
responders for tumour, and four responders and four non-responders 
for peripheral blood from our neoadjuvant ICB trial). Sample size was 
limited owing to the amount of tumour available given prioritization 
for other studies as well as tumour viability. These analyses included 
patients with nodal and non-nodal metastases (Extended Data Fig. 10a, 
Supplementary Tables 31, 32). 


four clusters identified by k-means clustering (G1-G4). Number of cells 
analysed is 1,760 B cells from 48 tumours arising in 32 patients treated with PD1 
blockade monotherapy, CTLA4 blockade monotherapy, or combined PD1 and 
CTLA4 blockade. d, t-SNE plots demonstrating peripheral blood and 
intratumoral combined B cell populations from mass cytometric analyses in 
responders versus non-responders (n=4 Rand 4 NR for peripheral blood and 
n=5Rand3 NR for tumour) from the neoadjuvant ICB trial in patients with 
advanced melanoma. e, Intratumoral B cell phenotypes included ind grouped 
by response. f, Quantification of B cell subtypes ine. Plots ind-f represent 
combined analyses of tumours ran simultaneously with the peripheral blood 
samples (n=5 Rand 3 NR) and include baseline and on-treatment samples as 
described in Supplementary Table 31. Statistical analyses including all samples 
are presented in Extended Data Fig. 10b. 


We first assessed differences between intratumoral B cells and those 
inthe peripheral blood of patients. In these studies, unique clusters of 
CD45*CD19* (Bcell) populations including naive (CD19*, CD27, IgD‘), 
transitional (CD19*, CD24**, CD38**, CD10*, CD27", IgD*), unswitched 
and switched memory (CD19*, CD27", IgD*”), double-negative (CD19", 
CD27, IgD’), and plasma (-like) cell (CD19*, CD20", CD22", CD38"*, 
CD27") populations were found in peripheral blood and tumour sam- 
ples, with distinct profiles in the tumour compared with peripheral 
blood samples (Fig. 3d, Extended Data Figs. 10a, b, 11a, b). Intratumoral 
Bcells had reduced expression of CD21, CD23, CD79b and CXCRS, point- 
ing to distinct functional and migratory profiles compared to similar 
Bcell populations in the peripheral blood (Extended Data Fig. 11b). We 
next compared the phenotypes of B cells in tumours and peripheral 
blood from responders and non-responders to ICB treatment. Although 
Bcellsubsets (naive, memory and transitional B cells and plasma cells) 
inthe peripheral blood had a similar distribution in responders and non- 
responders (Fig. 3d, Extended Data Fig. 10b), significant differences 
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were noted inthe subsets of B cells in tumours (Fig. 3e, f, Extended Data 
Fig. 10b). Specifically, tumours from responders had a significantly 
higher frequency of memory B cells, whereas non-responders had a 
significantly higher frequency of naive B cells (P= 0.033 for naive and 
P=0.033 for memory) (Fig. 3e, f, Extended Data Fig. 10b). Other notable 
differences included an increase in plasma cells in responders com- 
pared with non-responders; however, this did not reach significance and 
was largely driven by data from one patient (P= 0.3) (Fig. 3e, f, Extended 
Data Fig. 10b). More granular characterization of the intratumoral B 
cells reveals an increased percentage of CXCR3* switched memory B 
cells (P= 0.0083) in responders than in non-responders; we also note 
increased CD86‘ B cells (P=0.017) and increased germinal-centre-like 
(CD19*, CD20**, CD38*, CD27, IgD°, CD86*, CD95") B cells (P=0.24) in 
responders as compared to non-responders (Extended Data Figs. 10c, 
d, 11c). Increased proliferation of B cells suggestive of germinal centre 
formation and activity is observed within TLSs (Extended Data Fig. 7d). 


Summary 


Insummary, we present multiomic data that support a role for B cells 
within TLSs in the response to ICB in patients with metastatic mela- 
noma and RCC. Although the distinct mechanisms through which B 
cells contribute are incompletely understood, our data suggest that 
the same properties of memory B cells and plasma cells desirable for 
acquired immune responses may also be contributing to an effective 
Tcell response after ICB. Importantly, these B cells are probably acting 
together with other key immune constituents of the TLS by altering 
T cell activation and function as well as through other mechanisms. 
Memory B cells may be acting as antigen-presenting cells, driving 
the expansion of both memory and naive tumour-associated T cell 
responses. B cells can also secrete an array of cytokines (including 
TNF, IL-2, IL-6 and IFNy), through which they activate and recruit other 
immune effector cells, including T cells. The observation of switched 
memory B cells (that can differentiate into plasma cells) in responders 
suggests that they could be potentially contributing to the anti-tumour 
response by producing antibodies against the tumours. Although 
we did not have adequate samples to study this in our cohort, it is an 
important line of investigation moving forward, and insights could lead 
to new therapeutic approaches to enhance responses to ICB. Together, 
findings in these cohorts are provocative and represent important 
advances in our insight into therapeutic responses to ICB. Further 
studies are needed in additional (and larger) cohorts across tumour 
types and stage of disease, as well as with therapeutic regimens. These 
types of studies along with pre-clinical models will help lend statistical 
power to the notion that B cells independently contribute to anti- 
tumour immune function in the context of ICB therapy, and also to 
better understand the mechanisms through which B cells and TLSs 
may favourably affect responses. Nonetheless, findings from these 
unique cohorts provide important insight into the role of B cells and 
TLSs in therapeutic responses to ICB, and are likely to stimulate further 
research in this area. 
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Methods 


Patient cohort(s) and sample collection 

For the melanoma neoadjuvant cohort (NCT02519322)”, 23 patients 
enrolled in a phase II clinical trial of neoadjuvant ICB. Twelve patients 
received nivolumab monotherapy with 3 mg kg ‘every 2 weeks for up to 
4 doses, and 11 patients received ipilimumab 3 mg kg‘ with nivolumab 
1mgkg ‘every 3 weeks for up to 3 doses followed by surgical resection. 
These patients were treated at the University of Texas MD Anderson 
Cancer Center and had tumour samples collected and analysed under 
Institutional Review Board (IRB)-approved protocols (2015-0041, 2012- 
0846). Of note, these studies were conducted in accordance with the 
Declaration of Helsinski and approved by the UT MD Anderson Cancer 
Center IRB. Response was defined as achieving a complete or partial 
radiographic response by RECIST 1.1 between pre-treatment imaging 
and post-neoadjuvant treatment imaging before surgical resection. 
Tumour samples were collected at several time-points for correla- 
tive studies including baseline and on-treatment (weeks 3 and 5 for 
nivolumab monotherapy, weeks 4 and 7 for combination ipilimumab 
with nivolumab). Tumour samples were obtained as core, punch or exci- 
sional biopsies performed by treating clinicians or an interventional 
radiologist. Samples were immediately formalin-fixed and paraffin- 
embedded (FFPE), snap-frozen or digested following tissue collection. 

Additional patients off-protocol included five patients with widely 
metastatic melanoma who were treated at the University of Texas MD 
Anderson Cancer Center and had tumour samples collected and ana- 
lysed under IRB-approved protocols (_LABOO-063 and PA17 - 0261). 
Samples were immediately FFPE after tissue collection. 

For the validation melanoma cohort, we used samples of 18 patients 
enrolled in the OpACIN-neo trial (NCT02437279). In the phase 1b 
OpACIN-neo trial, 20 patients with palpable stage III melanoma were 
randomized 1:1 to receive ipilimumab 3 mg kg‘ and nivolumab 1 mg 
kg, either 4 courses after surgery (adjuvant arm), or 2 courses before 
surgery and two courses post-surgery (neoadjuvant arm). Coprimary 
endpoints were safety/feasibility and tumour-specific expansion of 
T cells. For this current correlative study, response was defined as not 
having disease relapse. These patients were treated at the Netherlands 
Cancer Institute (Amsterdam). The study was conducted in accord- 
ance with the Declaration of Helsinki and approved by the medical 
ethics committee of the Netherlands Cancer Institute. All subjects 
provided informed consent before their participation in the study. 
Patients underwent a pre-treatment tumour biopsy (1x formalin-fixed 
and paraffin-embedded (FFPE) and 2x fresh frozen) obtained as acore 
biopsy performed by a radiologist. RNA was extracted from one fro- 
zen biopsy for RNA-seq analysis. We included only 18 patients in our 
analysis because the tumour purity inthe frozen pre-treatment biopsy 
of 2 patients was too low, therefore no RNA could be isolated and these 
patients could not be included in this analysis. The clinical responses 
of this cohort have been previously described®. 

The RCC trial was an open-label, randomized, pre-surgical/pre-biopsy 
trial (NCT02210117) in which adults with metastatic RCC without previ- 
ousimmune checkpoint therapy and anti-VEGF therapy were enrolled 
and randomized 2:3:2 to receive nivolumab (3 mg kg once every 2 
weeks, x3 doses), nivolumab plus bevacizumab (3 mg kg once every 
2 weeks x3 plus 10 mg kg x3) or nivolumab plus ipilimumab (3 mgkg™ 
once every 2 weeks x3 1 mg kg! x2), followed by surgery (cytoreductive 
nephrectomy or metastasectomy), or biopsy at week 8-10, and subse- 
quent nivolumab maintenance therapy for up to 2 years. Response was 
assessed at 8 weeks and then at >12 weeks by RECIST 1.1 criteria. Clinical 
response data collection is still ongoing. For this current correlative 
study, clinical response for primary endpoint analysis was defined 
as achieving a complete or partial response at >12 weeks. Blood and 
tumours before and after treatment were obtained for correlative stud- 
ies by IRB-approved laboratory protocol PA13-0291. Tumour samples 
were obtained as core biopsies or surgical resection performed by 


interventional radiologists or surgeons. Samples were immediately 
FFPE or snap-frozen after tissue collection. 

The single-cell RNA-seq B cell analysis used a dataset from 32 patients 
with metastatic melanoma (n = 48 samples) treated with anti-PD1 
(n=37), anti-CTLA4 (n=2), or anti-PD1 and anti-CTLA4 (n=9)**. Patient 
response was determined by RECIST criteria: complete response and 
partial response for responders, or stable disease and progressive 
disease for non-responders. For the analysis, we focused on individ- 
ual lesions and classified them into two categories: responder (n=17) 
including complete-response and partial-response samples; non- 
responder (n= 31) including stable-disease and progressive-disease 
samples, based on radiological tumour evaluations. Samples were 
collected after patients provided a written consent for research and 
genomic profiling of collected tissue as approved by the Dana-Farber/ 
Harvard Cancer Center Institutional Review Board (DF/HCC protocol 
11-181) and UT MD Anderson Cancer Center (LABOO-063 and 2012- 
0846). 

For the targeted therapy cohort, 13 patients received neoadjuvant 
and adjuvant dabrafenib and trametinib as part of a single-centre, 
open-label randomized phase 2 trial for patients with BRAF(V600E) 
or BRAF(V600K) (thatis, Val600Glu or Val600Lys)-mutated melanoma 
(NCT02231775)—8 weeks of neoadjuvant oral dabrafenib 150 mg twice 
per day and oral trametinib 2 mg per day followed by surgery, then up 
to 44 weeks of adjuvant dabrafenib plus trametinib starting 1 week 
after surgery for a total of 52 weeks of treatment®. Patient radiographic 
response was determined by RECIST criteria with stable disease (non- 
responders) and partial response or complete response (responders) 
noted and coded as indicated; and pathological complete response 
determined by absence of residual viable malignant cells on H&E stain- 
ing. These patients were treated at the University of Texas MD Anderson 
Cancer Center and had tumour samples collected and analysed under 
IRB-approved protocols. These studies were conducted in accordance 
with the Declaration of Helsinski. 

The authors confirm for all studies involving human research partici- 
pants we have complied with all relevant ethical regulations. 


Gene expression profiling and analysis: RNA extraction for 
neoadjuvant melanoma ICB-treated cohort 

Total RNA was extracted from snap-frozen tumour specimens using the 
AllPrep DNA/RNA/miRNA Universal Kit (Qiagen) following assessment 
of tumour content by a pathologist, and macrodissection of tumour 
bed if required. RNA quality was assessed on an Agilent 2100 Bioana- 
lyzer using the Agilent RNA 6000 Nano Chip with smear analysis to 
determine DV200 and original RNA concentration. On the basis of 
RNA quality, 40-80 ng of total RNA from each sample then underwent 
library preparation using the Illumina TruSeq RNA Access Library Prep 
kit according to the manufacturer’s protocol. Barcoded libraries were 
pooled to produce final 10-12 plex pools before sequencing on an 
Illumina NextSeq sequencer using one high-output run per pool of 
76-bp paired-end reads, generating 8 fastq files (4 lanes, paired reads) 
per sample. 


RNA-seq data processing and quality check 

RNA-seq FASTQ files were first processed through FastQC (v.0.11.5)*, 
a quality control tool to evaluate the quality of sequencing reads at 
both the base and read levels. The reads that had > 15 contiguous low- 
quality bases (phred score < 20) were removed from the FASTQ files. 
STAR 2-pass alignment (v.2.5.3)*° was then performed on the filtered 
FASTQ files with default parameters to generate RNA-seq BAM file for 
each sequencing event. After that, RNA-SeQC (v.1.1.8)” was run on the 
aligned BAM files to generate a series of RNA-seq related quality control 
metrics including read counts, coverage, and correlation. A matrix of 
Spearman correlation coefficients was subsequently generated by RNA- 
SeQC among all sequencing events. The correlation matrix was carefully 
reviewed and the sequencing event generated from one library pool 


that showed poor correlation with other library pools from the same 
RNA sample were removed before sample-level merging of BAM files. 


Gene expression quantification and normalization 

HTSeq-count (v.0.Fig.9.1)** tool was applied to aligned RNA-seq BAM 
files to count for each gene how many aligned reads overlap with its 
exons. The raw read counts generated from HTSeq-count (v.0.9.1)* 
were normalized into fragments per kilobase of transcript per million 
mapped reads (FPKM) using the RNA-seq quantification approach 
suggested by the bioinformatics team of NCI Genomic DataCommons 
(GDC; https://gdc.cancer.gov/about-data/data-harmonization-and- 
generation/genomic-data-harmonization/high-level-data-genera- 
tion/rna-seq-quantification). In brief, FPKM normalizes read count by 
dividing it by the gene length and the total number of reads mapped to 
protein-coding genes using a calculation described below: 


FPKM= ce” as 
RC, xL 

in which RC, denotes the number of reads mapped to the gene; RC,, 

denotes the number of reads mapped to all protein-coding genes; and 

L denotes the length of the gene in base pairs (calculated as the sum of 

all exons in a gene). The FPKM values were then log,-transformed for 

further downstream processes. 


RNA-seq analysis for OpACIN-neo trial 
RNA-seq and data analysis were performed as previously described”. 


Affymetrix microarray for RCC 

The Affymetrix microarray data were created using the Affymetrix 
Clariom D Assay (Human). There are 28 available pre-treatment samples 
from 3 arms: nivolumab (n = 6), nivolumab plus bevacizumab (n= 14) 
and nivolumab plus ipilimumab (n=8). The raw CEL files were normal- 
ized using the built-in SST-RMA method of the Affymetrix Transcrip- 
tome Analysis Console (TAC, v.4.0) software. The cell lineage scores 
were calculated using the R package MCP-counter algorithm (v.1.1.0). 
The Limma R software package” was used to identify DEGs from nor- 
malized microarray data for the RCC cohort. 


Identification of DEGs 

The HTSeq normalized read count data for all expressed coding tran- 
scripts was processed by Deseq2 (v.3.6)°° software to identify DEGs 
between two response (responders versus non-responders) groups. A 
cut-off of gene-expression fold change of >2 or <0.5andaFDRqg<0.05 
was applied to select the most DEGs. The Limma R software package” 
was used to identify DEGs from normalized microarray data for the 
RCC cohort. 


Deconvolution of the cellular composition with MCP-counter 
The R package software MCP-counter’ was applied to the normalized 
log,-transformed FPKM expression matrix to produce the absolute 
abundance scores for eight major immune cell types (CD3* T cells, 
CD8*' T cells, cytotoxic lymphocytes, natural killer cells, B lympho- 
cytes, monocytic lineage cells, myeloid dendritic cells and neutro- 
phils), endothelial cells and fibroblasts. The deconvolution profiles 
were then hierarchically clustered and compared across response and 
treatment groups. 


Pathway enrichment analyses 

The network-based pathway enrichment analysis was performed using 
DEGs across responder and non-responder groups in the bulk-tissue 
RNA-seq data from the melanoma neoadjuvant cohort and single-cell 
RNA-seq data from the metastatic melanoma cohort. In the bulk-tissue, 
the differentially expressed genes that had ag < 0.05 and log,-trans- 
formed fold change >1.5 or < -1.5 were selected as input for network 


based pathway enrichment analysis using ReactomeFiViz™ application 
in Cytoscape””’. In single-cell, the DEGs with q < 0.1 were selected as 
input for pathway enrichment analysis. Pathway enrichment was cal- 
culated using several biological databases (KEGG, NCBI, Reactome, 
Biocarta and Panther) with hypergeometric test FDR < 0.01. 


TCGA SKCM and KIRC data downloading and patient selection 

The normalized RNA-seq expression data of TCGA skin cutaneous 
melanoma (TCGA-SKCM) and Kidney Renal Clear Cell Carcinoma 
(TCGA-KIRC) was downloaded from NCI Genomic Data Commons 
(GDC; https://portal.gdc.cancer.gov) and the relevant clinical data 
were downloaded from recent TCGA PanCancer clinical data study™. 
The information of SKCM genomic subtypes was obtained from the 
TCGA-SKCM study®.To achieve a uniform cohort of patients with stage 
III (non-recurrent) melanoma for analysis, we applied an appropriate 
set of sequential filters: the TCGA-SKCM cohort was filtered to include 
patients with biospecimen tissue sites that included regional lymph 
node or regional subcutaneous metastases. We excluded patients pre- 
senting with stage IV disease. Then, to exclude patients with recurrent 
stage III disease, we excluded all patients for whom the number of days 
from the diagnosis of the primary to the accession date was more than 
90 days. In addition, for a patient to be included, their tumour must 
also have had a defined melanoma driver type. Finally, we eliminated 
those lacking sufficient gene expression data, yielding a final stage III 
TCGA-SKCM cohort of n= 136. Survival data were missing for 9 of 136 
samples, so n = 127 samples were available for overall survival analy- 
ses. For TCGA-KIRC, the cases without available expression data were 
excluded and a total of 526 cases were taken into subsequent analysis. 


Survival analyses 

In TCGA cohort, survival data were not available for nine samples and 
these were excluded from survival analysis. As previously described”, 
the survival time for each patient for the SECM melanoma cohort was 
‘curated TCGA survival’ (that is, from time of TCGA biospecimen pro- 
curement). The time to event was defined as the time interval from 
date of accession for each sample to date of death or censoring from 
any cause (curated value CURATED _TCGA days to_death_or_last_fol- 
low-up; aka TCGA post-accession survival). The survival analysis was 
performed using Cox proportional hazards model and survival curves 
were plotted using Kaplan-Meier method. The statistical comparison 
of the survival curves was done using the log-rank test. The analysis 
was done using R package survival (https://cran.r-project.org/web/ 
packages/survival/index.html). 


Statistical analyses 

The statistical comparison between responder and non-responder 
groups fora given continuous variable was performed using two-sided 
Mann-Whitney U-test. The association between two continuous vari- 
ables was assessed using Spearman’s rank correlation coefficient. To 
control for multiple comparisons, we applied the Benjamini-Hochberg 
method” and calculated adjusted P values. Univariable and multivari- 
able analysis predicting response to ICB was performed using logistic 
regression modelling. Biological replicates are indicated in the indi- 
vidual figure legends. Technical replicates were constrained ton=1 
per time point, owing to limited tissue availability in patient-derived 
samples as well as prioritization for multiple studies. No statistical 
methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation 
during experiments and outcome assessment unless stated otherwise. 


Single immunohistochemistry 

H&E and immunohistochemistry staining were performed on FFPE 
tumour tissue sections. The tumour tissues were fixed in 10% formalin, 
embedded in paraffin, and serially sectioned. Four-micrometre sections 
were used for the histopathological study. 
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Sections were stained with mouse or rabbit anti-human monoclonal 
antibodies against CD20 (Dako, M0755, 1:1,400), CD21 (Novocastra, 
NCL-L-CD21-2G9, 1:10 or Leica, CD21-2G9; 1:20), CD23 (Leica, CD23- 
1B12, 1:15), CD4 (Novocastra, CD4-368-L-A, 1:80) CD8 (Thermo Sci- 
entific, MS-457-S, 1:25), FOXP3 (Biolegend, 320102, 1:50). All sections 
were counterstained with haematoxylin, dehydrated and mounted. 
All sections were processed with peroxidase-conjugated avidin/biotin 
and 3’-3-diaminobenzidine (DAB) substrate (Leica Microsystem) and 
slides were scanned and digitalized using the scanscope system from 
Scanscope XT, Aperio/Leica Technologies. 

Quantitative analysis of immunohistochemistry staining was con- 
ducted using the image analysis software ImageScope-Aperio/Leica. 
Five randomareas (1mm? each) were selected using a customized algo- 
rithm for each marker in order to determine the number of positive cells 
at high power field. The data are expressed as a density (total number 
of positive cells per mm? area). Immunohistochemistry staining was 
interpreted in conjunction with H&E stained sections. 


TLS quantification 

TLSs were qualified and quantified using both H&E and CD20 immu- 
nohistochemistry staining. Structures were identified as aggregates 
of lymphocytes having histological features with analogous struc- 
tures to that of lymphoid tissue with germinal centres (including B 
cells (CD19/20), T cells (CD3), follicular dendritic cells (CD21) and high 
endothelial venules (MECA79), appearing in the tumour area’*°°S, For 
the current study, criteria used for the quantification of TLS includes: 
(1) the total number of structures identified either within the tumoral 
area or in direct contact with the tumoral cells on the margin of the 
tumours (numbers of TLS per mm” area); and (2) anormalization of 
the total area occupied by the TLNs in relation of the total area of the 
tumour analysed (ratio: area of TLS/area tumour + TLNs). 


Multiplex immunofluorescence assay and analysis 

For images shown in Fig. 2 and Extended Data Fig. 6, for immunofluo- 
rescence multiplex staining, we followed the staining method for the 
following markers: CD20 (Dako, MO755, 1:500) with subsequent visu- 
alization using fluorescein Cy3 (1:50); CD21 (Novocastra, NCL-L-CD21- 
2G9, 1:10) with subsequent visualization using fluorescein Cy5 (1:50); 
CD4 (CM153BK, Biocare, 1:25) with subsequent visualization using 
fluorescein Cy5.5 (1:50); CD8 (1:200, M7103, Dako) with subsequent 
visualization using fluorescein Cy3.5 (1:50); FOXP3 (Biolegend, 320102, 
1:50) with subsequent visualization using fluorescein FITC (1:50) and 
nuclei visualized with DAPI (1:2,000). All of the sections were cover- 
slipped using Vectashield Hardset 895 mounting medium. 

The slides were scanned using the Vectra slide scanner (PerkinElmer). 
For each marker, the mean fluorescent intensity per case was then deter- 
mined as a base point from which positive calls could be established. 
For multispectral analysis, each of the individually stained sections 
was used to establish the spectral library of the fluorophores. Five 
random areas on each sample were analysed blindly by a pathologist 
at 20x magnification. 

For additional multiplex images shown in Extended Data Fig. 5, for 
additional multiplex staining, we followed similar methods to the above 
for the following markers: MECA79-Dy550 (Novus, MECA-79, 1:100); 
CD20-Dy594 (Novus, IGEL/773; 1:100); CD4-AF647 (abcam, ERP6855, 
1:100); and nuclei visualized with Syto13 at 500 nM. The slides were 
scanned with the GeoMx DSP machine as described below. 


GeoMx Digital Spatial Profiling: microscope and fluidics system 
overview 

Forimmune profiling of T cells located within and outside TLS structures 
in patient samples, the GeoMx Digital Spatial Profiler (NanoString), a 
custom-built high-speed automated system and integrated instrument 
software, was used. A multiplexed cocktail of primary antibodies with 
UV photocleavable indexing oligonucleotides (GeoMx Immune Profile 


Core; 22 targets, including 3 isotype controls and 4 additional modules; 
10 Drug Target, Immune Activation Status, Immune Cell Typing, and 
Pan Tumour) and 4 fluorescent markers was applied toa slide-mounted 
FFPE tissue section. For the fluorescent markers, we used Syto13 at 500 
uM for nuclei visualization; CD20-Dy594 (Novus, IGEL/773; 1:100); 
CD3-AF647 (Novus, C3e/1308; 1:100); and PMEL-Dy550 (Novus, HMB45; 
1:100) with SIOOB-Dy550 (Novus, 15F4NB; 1:100). Images at x20 magni- 
fication were assembled to yield a high-resolution image of the tissue 
area of interest. The specific regions of interest (ROIs) for molecular 
profiling were then selected based on location (TLS or non-TLS areas of 
tumour) and CD3-positive staining and sequentially processed by the 
microscope automation. ROIs were selectively illuminated with UV light 
to release the indexing oligos by coupling UV LED light with a double 
digital mirror device (DDMD) module. Following each UV illumination 
cycle, the eluent was collected from the local region via microcapillary 
aspiration and transferred to an individual well of a microtiter plate. 
Once all ROIs were processed, pools of released indexing oligos were 
hybridized to NanoString optical barcodes for digital counting and 
subsequently analysed with an nCounter Analysis System. 


nCounter hybridization assay for photocleaved oligo counting 
Hybridization of cleaved indexing oligonucleotides to fluorescent 
barcodes was performed using the nCounter Protein PlexSet reagents 
based on manufacturer’s directions. Hybridizations were performed at 
65°C overnight in a thermocycler. After hybridization, samples were 
processed using the nCounter Prep Station and Digital Analyzer as per 
manufacturer instructions. Data were normalized to technical controls 
and area. Data were calculated against isotype controls to generate 
signal-to-noise ratios. Protein targets with a signal to noise ratio less 
than 2 were removed from downstream analysis. 


Bcell clonotype analyses 

The modified TRUST algorithm” was applied to extract the B cell 
immunoglobin hypervariable regions from the bulk RNA-seq data and 
assemble the complementarity-determining region 3 (CDR3) sequences 
of the B cell heavy chain (IgH) and light chain (IgL). BCR clonotypes 
were identified and the clonal fraction was automatically calculated by 
TRUST. The output of TRUST was parsed by the R package tcR (v.3.4.1)° 
for downstream analyses. Only in-frame productive clonotypes were 
taken into subsequent analysis. The total number of BCR clonotypes 
detected per sample was normalized by the corresponding sequenc- 
ing depth of each individual sample and calculated as per 100 million 
mapped reads. The top five clonotypes were selected by their clonal 
expression abundance. The BCR repertoire diversity was calculated 
by entropy from the tcR package. 


Single-cell sequencing and analysis of CD45" B cells 

Fresh isolated tumour samples were dissociated using the human 
tumour dissociation kit (Miltenyi Biotec; 130-095-929), sorted 
into 96-well plates containing 10 pl of TCL buffer (Qiagen) with 1% 
6-mercaptoethanol, using the following anti-human antibodies: 
FexX (Biolegend, 422302), CD45-PE (Biolegend, 304008), CD3-APC 
(Biolegend, 300412), CD235a-APC/Cy7 (Biolegend, 349116) and HLA- 
A,B,C-FITC (Biolegend, 311426). Sorting of viable cells was performed 
using the live/dead dye Zombie Violet (Biolegend, 77477). Single-cell 
libraries were generated using a modified version of the full-length 
Smart-seq2 protocol as previously described“, and were sequenced on 
a NextSeq 500 sequencer (Illumina), resulting ina median of approx- 
imately 1.4 million paired-end reads and a median of 2,588 genes 
detected per cell. A cutoff of log,(transcripts per million (TPM) +1) >2 
was used to define a gene as expressed in each single cell. For each 
sample, we computed the fraction of B cells using pre-defined markers 
(CD19 and/or MS4A1). Notably, this is a plate-based protocol; thus, for 
each patient, we collected and sequenced the same number of cells 
(n=384 CD45‘ cells per plate). Thus, the number of cells per patient 


is equal, and the frequency reflects patients with either high or low 
B cell infiltrate. 


Unsupervised clustering of immune cells 

To cluster all cells that passed quality control, we applied the k-means 
algorithm with a correlation distance metric, testing k=3, ..., 15. The 
algorithm was applied using all genes with variance >6, yielding approx- 
imately 4,000 genes. This value was selected based on the relation 
between the variance and the fraction of cells expressing each gene. 
To determine the optimal number of clusters we applied the following 
steps: (1) we first examined how much of the complexity each cluster 
captures by applying the elbow method. This was done by computing 
the Pearson correlation matrix R and the distance matrix D as (1- R). 
We then computed the sum of pairwise distances between all cells in 
different clusters, Dis, = ye bee jeC Dii.j)), and the total distance, 
Dis, = 2; ; Df), in which iandj stand for each pair of single cells. The 
ratio between these two measures,V = Dis,/Dis, was used to estimate 
the variance explained by a given solution, such that in the extreme 
case in which all cells are clustered together or the case in which each 
cellis a single cluster, this ratio would be O and 1, respectively. Explor- 
ing this ratio, we then select the solutions that are near plateau 
(k=10, ..., 15). (2) We then performed differential expression analysis 
(see ‘Differential expression analysis’) to search for gene markers that 
are significantly more highly expressed ina specific cluster as compared 
to all other clusters. Then, to avoid complex solutions, we excluded 
solutions with clusters that have too few marker genes (<20) distin- 
guishing between them and the rest of the cells. (3) Finally, we per- 
formed arobustness analysis and selected the clustering solution with 
the highest median robustness score. Specifically, to determine the 
robustness of each clustering solution, we performed 100 iterations 
in which we randomly removed 10% of the cells, and re-ran the k-means 
algorithm and checked the stability of the clustering solution. We quan- 
tified the agreement of a given solution with the original one as the 
number of pairs of cells that were either clustered together, or not 
clustered together, in both solutions, divided by the total number pairs 
shared between the runs. This process yielded a median robustness 
measure of 0.96 for the selected k=11. 


Differential expression analysis 

In all cases, differential expression analysis was applied to all genes 
that had an average expression level log,(TPM+ 1) > 2 in either tested 
groups, G, and G,. Then, for each gene i, we count the number of cells 
in G, and G, that express it with an expression level log,(TPM + 1) >2 
or < 2. We then apply Fisher’s exact test for the corresponding 2 x 2 
table. To identify significant differences, we considered genes witha 
Bonferroni-corrected g < 0.05 and log,-transformed fold change > 0.5. 


CyTOF antibody conjugation 

In-depth characterization of B cells from responders and non-respond- 
ers was performed using metal-tagged antibodies. Metal conjugated 
antibodies were purchased from Fluidigm or conjugated to unlabelled 
antibodies in-house. All unlabelled antibodies were purchased in 
carrier-free form and conjugated with the corresponding metal tag 
using Maxpar X8 polymer per manufacturer’s instructions (Fluidigm). 
Metal isotopes were acquired from Fluidigm and indium (III) chloride 
was acquired from Sigma-Aldrich. Antibody concentration was deter- 
mined by measuring the amount of A280 protein using Nanodrop 2000 
(Thermo Fisher Scientific). Conjugated antibodies were diluted using 
PBS-based antibody stabilizer supplemented with 0.05% sodium azide 
(Sigma-Aldrich) toa final concentration of 0.5 mg mI. Antibodies used 
with the corresponding metal tag isotopes: CD45 (Fluidigm, HI30, 
8°Y), CD80 (Biolegend, 2D10, In), CD138 (BD Biosciences, MI15, Pr), 
CD19 (Fluidigm, HIB19, Nd), CD5 (Fluidigm, UCHT2, “?Nd), HLA-ABC 
(BD Biosciences, EMR8-5, “*Nd), CD178 (Biolegend, NOK1, ‘“°Nd), IgD 
(Biolegend, IA6-2, “°Nd), CD20 (Fluidigm, 2H7, “’Sm), PDL1 (Fluidigm, 


29E.2A3, “8Nd), HLA-DR (Biolegend, L243, “°Sm), CD25 (BD Biosciences, 
2A3,°°Nd), IGM (Biolegend, MHM-88, Eu), CD95 (BD Biosciences, DX2, 
825m), CXCRS (Fluidigm, RF8B2, **Eu), CD86 (BD Biosciences, IT2.2, 
84Sm), CD27 (Fluidigm, L128, *°Gd), CXCR3 (Biolegend, GO25H7, *°Gd), 
CD10 (Fluidigm, H110a, 48Gd), PDL-2 (Biolegend, 24F.10C12, S°Tb), CD39 
(Fluidigm, A1, ©°Gd), BAFF-R (Biolegend, 11C1, Dy), CD79b (Fluidigm, 
CB3.1, Dy), CD1d (Biolegend, 51.1, Dy), CD23 (Fluidigm, EBVCS-5, 
14D y), CD40 (Biolegend, 5C3, Ho), CD24 (BD Biosciences, MLS, 'Er), 
CD38 (BD Bioscience, HIT2, Er), CD21 (Biolegend, Bu32, Er), ICOS 
(Biolegend, C398.4A, Tb), CTLA4 (Fluidigm, 14D3, ’°Er), CD9 (Bio- 
legend, H19a, ‘“Yb), CD11c (Biolegend, Bul5, Yb), CD14 (Biolegend, 
HCD14,!”Yb), PD1 (Miltenyi, PD1.3.1.3, ™vb), CXCR4 (Biolegend, 12G5, 
1 u), CD22 (Biolegend, HIB22, '“Yb), CD3 (Biolegend, UCHT-1, Pt), 
cisplatin (Fluidigm, '°Pt) and CD16 (Fluidigm, 3G8, 7°°Bi). 


Sample preparation and acquisition 

Peripheral blood mononuclear cells and tumour cells were collected 
and washed twice with wash buffer (0.5% bovine serum albumin (BSA) 
in PBS). For tumour, this included 9 responders and 9 non-responders, 
and for peripheral blood mononuclear cells, 8 responders and 8 non- 
responders. To determine the live population, cells were stained with 
1M cisplatin for 3 min. The reaction was stopped with FACS buffer 
(2% fetal bovine serum (FBS) in PBS), and the cells were washed once 
with wash buffer. Cells were then incubated with 5 ul of Fc receptor 
blocking buffer reagent (Miltenyi) for 10 min at room temperature. 
Cells were incubated with surface antibodies at room temperature 
for 60 min, washed twice with wash buffer and stored overnight in1 
ml of 1.6% paraformaldehyde (EMD Biosciences) in PBS with 125 nM 
iridium nucleic acid intercalator (Fluidigm). The next day, samples 
were washed twice with cell staining buffer, re-suspended in 1 ml of 
MilliQ dH20, filtered through a 35-um nylon mesh (cell strainer cap 
tubes, BD) and counted. Before analysis, samples were resuspended 
in MilliQ dH,O supplemented with EQ four element calibration beads 
at a concentration of 0.5 x 10° per ml. Samples were acquired at 300 
events per second ona Helios instrument (Fluidigm) using the Helios 
6.5.358 acquisition software (Fluidigm). 


Data analysis 

Mass cytometry data were normalized based on EQ four element signal 
shift over time using Fluidigm normalization software 2. Initial data 
processing was performed using Flowjo version 10.2. Mass cytometry 
data were normalized based on EQTM four element signal shift over 
time using Fluidigm normalization software 2. Initially, all responder 
and non-responder normalized FCS files were either concatenated or 
separately exported for downstream analyses. Data were processed and 
analysed using Cytobank; CD19* sample ‘clean-up’ was performed by 
gating on intact ("Ir* DNA stain), no beads (#°Ce ), live (°8Pt ), no T-cells 
CD3° (*Pt), no monocytes CD14" (Yb) and CD45" (°’Y), no natural 
killer cells CD16" (?°°Bi), CD19" B cells. Mass cytometry complex data 
were analysed using viSNE, in combination with heat map, to identify 
distinct subpopulations using the following parameters: CD19 (Nd), 
CD20 (*’Sm), CD5 (“8Nd), HLA-ABC (*Nd), IgD (*°Nd), PDL1 (#8Nd), 
HLA-DR (Sm), CD25 (©°Nd), IgM (Eu), CD95 (Sm), CXCR5 (Eu), 
CD86 (**Sm), CD27 (Gd), CXCR3 (*°Gd), CD10 (°8Gd), CD39 (!°°Gd), 
BAFFR (Dy), CD79b (Dy), CD1d ¢@Dy), CD23 (Dy), CD40 (Ho), 
CD24 (Er), CD38(’Er), CD9(?'Yb), CD11c (’?Yb), CXCR4 (Lu), and 
CD22 ?“Yb). Samples with fewer than 200 CD45*CD19* B cells were not 
used for downstream analyses. Percentages of different subpopulations 
of Bcells were measured in aggregated responder and non-responder 
peripheral blood cells and tumour samples for each run; statistical 
analyses performed via unpaired Student’s t-test. 


Analysis of peripheral blood exosomes from human plasma 
Approximately 1 ml of plasma per patient sample contained inacryovial 
was thawed rapidly in a 37 °C water bath. The plasma was transferred 
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into a1.5-ml Eppendorf tube and centrifuged at room temperature 
for 5 min at 800g and 10 min at 2,000g. The supernatant was filtered 
with a 0.22-um filter (6789-1302) directly into an ultracentrifuge tube 
(Z80615SCA, 331372). A distinct filter was used for each 500 ul of plasma 
filtered, and each filter was subsequently cleared with 2 x 1ml PBS), all 
of which was collected into the ultracentrifuge tube. Additional PBS 
was added to the ultracentrifuge tube to reach 11 ml. The tubes were 
the ultracentrifuged at 4 °C for 15-16 h at 100,000g using a Beckman 
Optima XE-90 ultracentrifuge. The pellet was resuspended in 200-300 
pl of PBS by pipetting up and down. The exosomes contained in this 
resuspension were stored at —80 °C until further use. 


Flow cytometric analyses of exosomes 

Exosomes were thawed on ice. Concentration was determined using 
the NanoSight NS300 nanoparticle tracking analyser according to the 
manufacturer’s directions, and 15 pl of exosomes (which was equiva- 
lent to approximately 4 x 10° particles on average) were mixed with 
30 pl of pre-washed anti-human CD63-coated Dynabeads (Invitrogen, 
10606D). For one sample, the Nanosight measurement was errone- 
ous and excluded. All samples were included in the flow cytometric 
analyses. Round-bottom 2-ml tubes were used. All pre-wash and washes 
thereafter were performed using 0.22 pm filtered 0.1% BSA in PBS (0.1% 
BSA/PBS) and the samples were mixed well by pipetting up and down 
at each wash steps. One-hundred microlitres of 0.1% BSA/PBS was 
added to beads + exosomes mixture for a final volume of 145 pl (15 
pl of exosomes + 30 pl of Dynabeads + 100 pl of 0.1% BSA/PBS). The 
samples were mixed by pipetting up and down and allowed to incubate 
for 4-16 hat room temperature ona benchtop rotator. Three-hundred 
microlitres of 0.1% BSA/PBS was added to the samples and the samples 
were placed ona magnet (1 min incubation minimum). The supernatant 
was discarded and the beads (and bound exosomes) were washed once 
with 400 pl 0.1% BSA/PBS. 

The beads (with bound exosomes) were resuspended in 400 pl of 0.1% 
BSA/PBS and subsequently split into four distinct round-bottom 2-ml 
tubes, each containing 100 pl. To each of these tubes, either antibod- 
ies or isotype control were added. These include: PE/Cy7 anti-human 
CD20 (Biolegend, 302312, clone 2H7) or isotype control PE/Cy7 mouse 
IgG2b (Biolegend, 400326, clone MCP-11); APC/Cy7 anti-human CD27 
(Biolegend, 356424, clone M-T271) or isotype control APC/Cy7 mouse 
IgG1 (Biolegend, 400128, clone MOPC-21); PE/Cy7 anti-human CD9 
(Biolegend, 312116, clone HI9a) or isotype control PE/Cy7 mouse IgG1 
(Biolegend, 400126, clone MOPC-21); and Alexa Fluor 647 anti-human 
CD63 (Biolegend, 353016, clone H5C6) or isotype control Alexa Fluor 
647 mouse IgGl (Biolegend, 400130, clone MOPC-21). For each antibody 
or isotype control, 0.4 pg was added to each tube. The samples were 
allowed to incubate at room temperature for 1-3 h, inthe dark. Three- 
hundred microlitres of 0.1% BSA/PBS was added tothe samples and the 
samples were placed ona magnet (1 minincubation). The supernatant 
was discarded and the beads (and bound exosomes) were washed once 
with 400 pl 0.1% BSA/PBS. The beads were visible on the magnet at each 
step of the procedure described above. The supernatant was discarded 
and the beads were resuspended in 200 ul of 0.1% BSA/PBS and trans- 
ferred into flow cytometry tubes for flow cytometry analysis. The flow 
cytometry data were captured within 24 h of completing the staining of 
the beads-exosomes samples. If not read immediately after completing 
the staining, the flow cytometry tubes were stored at 4 °C in the dark. 
The data were subsequently analysed using FlowJo. Responder versus 
non-responder status was blinded until flow cytometry data capture 
and FlowJo analyses were completed. 

For GPCI staining, three tubes of beads with exosomes were pro- 
cessed in parallel. One tube did not receive any antibody (exosomes 
alone), one tube received primary antibody (1h) followed by secondary 
antibody (1h), and one tube received secondary antibody only (1h). All 
three tubes were processed similarly, including for a wash step after 
one hour (post primary antibody incubation, 300 pl 0.1% BSA/PBS was 


added to the samples and the samples were placed on a magnet for1 
min incubation, and then resuspended into 100 ul of 0.1% BSA/PBS), 
and again another hour later (after the secondary antibody incubation), 
before transferred into a FC tube. All incubations were carried out at 
RT and covered from light, and beads were visible at each step when 
placed on the magnet. Rabbit anti-human GPC1 antibody was used 
(Sigma, SAB2700282, 3 pl per tube), and Alexa Fluor 488 conjugated 
goat anti-rabbit IgG (Invitrogen, A-11008, 2 pl per tube) were used. The 
samples were analysed by flow cytometry. 


Nanoimager analyses 

Beads with exosomes stained for flow cytometry analysis for CD63 
(Alexa Fluor 647 anti-human CD63) or isotype control as described 
above (see ‘Flow cytometric analyses of exosomes’) were evaluated 
by using the Nanoimager S Mark I from ONI (Oxford Nanoimaging) 
with the lasers 405 nm/150 mW, 488 nm/200 mW, 561nm/300 mw, 
640 nm/1 W and dual emission channels split at 560 nm. Data were 
processed on NimOS (v.1.25) from ONI. In brief, 25 pl of sample was 
spotted onto aslide (Fisher Scientific, 12-550-15), covered with a1.5H 
coverslip (Zeiss, 474030-9000), and immediately placed on the stage. 
Allimages were captured using HILO mode (highly inclined and lami- 
nated optical sheet) at an illumination angle of 35.0° with a 10.0-ms 
exposure setting for 200 frames. To minimize photobleaching, the 
focal plane of the beads was found under the 405 nm laser at 37% 
power, then switched to the 640 nm laser at 25% power for image 
acquisition. 


Electron microscopy analyses 

Bead only and beads with exosomes were prepared as described above 
(‘Flow cytometric analyses of exosomes’). The samples were magnet- 
ized and resuspended in 50 pl of 1% glutaraldehyde in PBS at 4 °C, orin 
30ul of 0.1% BSA/PBS, and mixed with 30 ul of warm (60 °C) 1% agarose 
in distilled water. The agarose-bead mixture was allowed to cool onice, 
and the gels were cut into approximately 1-mm?’ pieces and placed in 
1% glutaraldehyde in PBS at 4 °C. Fixed samples were washed in 0.1M 
sodium cacodylate buffer and treated with 0.1% Millipore-filtered caco- 
dylate buffered tannic acid, postfixed with 1% buffered osmium, and 
stained en bloc with 1% Millipore-filtered uranyl acetate. The samples 
were dehydrated in increasing concentrations of ethanol, infiltrated, 
and embedded in LX-112 medium. The samples were polymerized in 
a 60 °C oven for approximately 3 days. Ultrathin sections were cut in 
a Leica Ultracut microtome (Leica), stained with uranyl acetate and 
lead citrate ina Leica EM Stainer, and examined inaJEM1010 transmis- 
sion electron microscope (JEOL) at an accelerating voltage of 80 kV. 
Digital images were obtained using AMT Imaging System (Advanced 
Microscopy Techniques). Two-sided Mann-Whitney U-test was used 
to determine significance. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

The additional datasets generated during and/or analysed during the 
current study for Clinical Trial NCT02519322 are now available in the 
European Genome-phenome Archive repository (EGASO0001003178). 
Other datasets generated during and/or analysed during the current 
study are available from the corresponding author on reasonable 
request. 


Code availability 


The authors declare that the code for reproducibility of data are pub- 
licly available or will be available upon request. 
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Extended Data Fig. 1| MCP-counter results in patients with melanoma and 
RCC treated with pre-surgical ICB or targeted therapy. a, Supervised 
clustering by response of MCP-counter scores in on-treatment samples froma 
cohort of high-risk patients with resectable melanomatreated with 
neoadjuvant ICB, with responders defined as achieving a complete or partial 
response by RECIST 1.1(n=11 NR and 9 R). b, Analysis shown by unsupervised 
hierarchical clustering of baseline (2=11NRand10R) and on-treatment 
samples (n=11NRand 9R) from the neoadjuvant melanoma cohort. Unique 
clusters identified are indicated by shaded boxes ontop row. c, Unsupervised 
hierarchical analysis shown for metastatic RCC patients (same cohortas Fig. 1d; 
n=11PDand17 PR). Response (PR, partial response) or non-response (PD, 
progressive disease) as measured by RECIST 1.1. Unique clusters identified are 
indicated by shaded boxes ontop row. d, Supervised clustering by response of 
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MCP-counter scores from OpACIN-neo clinical trial (NCTO2437279) of 
neoadjuvant versus adjuvant ICB in high-risk resectable melanoma (n=6NR 
and 12R). Responders were defined as patients who did not have arelapse. 

e, Supervised clustering by response of MCP-counter scores in combined pre- 
treatment and on-treatment biopsies froma cohort of high-risk resectable 
melanoma patients treated with neoadjuvant targeted therapy (dabrafenib and 
trametinib) as part of NCT02231775 (n=7 patients for baseline andn=8 
patients for on-treatment samples) with responder defined as achieving a 
complete or partial response by RECIST 1.1and non-responder defined as 
having stable or progressive disease. Pathological response is defined by the 
presence or absence of viable tumour at time of surgical resection. Pvalues 
were made using two-sided Mann-Whitney U-test. 
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Extended Data Fig. 2| Representation of MCP-counter scores for all patient 
cohorts and analyses of peripheral blood exosomes. a-c, Box plot 
representation of heat maps for patients with: high-risk resectable melanoma 
treated with neoadjuvant ICB (n=11 NRand 10R for baseline andn=11NRand 
9Rontreatment) as presented in Fig. lc and Extended Data Fig. la, b (a); 
metastatic RCC treated with pre-surgical ICB as presented in Fig. ld and 
Extended Data Fig. 1c (n=11PD and 17 PR) (b); and high-risk resectable 
melanoma treated with ICB as part of OpACIN-neo trial as presented in 
Extended Data Fig. 1d (n= 6 NRand 12R) (c). Fora-c, medians with interquartile 
range are shown. Pvalues were determined by two-sided Mann-Whitney U-test. 
d, Schematic for exosomal analyses of serum samples from patients with 
melanoma on neoadjuvant ICB trial. e, Representative transmission electron 


micrographs showing Dynabead with exosomes present after immunocapture. 
f, Nanoimager-captured images of the beads coated with CD63* exosomes as 
compared with isotype control. g,h, Exosomal concentration (g) and mean 
exosomal size (h) for serum samples for responders and non-responders at the 
time point indicated. i, Ratio of mean fluorescent intensity (MFI) of beads 
stained with anti-CD63 as compared toisotype control.j, Ratio of mean 
fluorescent intensity of beads stained with anti-CD9, -CD20, -CD27 and -GPC1 
antibodies as compared to isotype control (or secondary antibody only for 
GPC1). For e-j, bars indicate median values and individual data points 
representing 8 Rand 5 NR (unless indicated in the Methods) in addition to 
interquartile ranges. Pvalues were determined using two-sided Mann-Whitney 
U-test. 
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Extended Data Fig. 3| Transcriptional analysis of tumour specimens from 
patients with metastatic RCC treated with pre-surgical ICB. a, Supervised 
hierarchical clustering by response of RCC tumour specimens at baseline of 
most DEGs by microarray analysis, with response defined as having a partial 
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(n=11PD and 17 PR). Fold change and Pvalues are calculated by the limma 
package as described in the Methods. A cut-off of gene expression fold change 
of>2or<0.5andaFDRq<0.05 was applied to select DEGs. b, Volcano plot 
depiction of DEGs by response from same cohort. 


a n=37 n=65 n=34 


(27.2%) (47.8%) (25.0%) 
‘cr mic 
Disease Site 1 Hn tenn BA 
Subtype =a vale i le ipa ai 
T cells | : ii 112-19 Hic " 
CD8T cells \ | Ll! 1! 90018 eryPES : 
Cytotoxie'lymphocytes: | [ | ip | if a I ‘if ; l . | : 1814 BRAF_Hotspot_Mutants ' 
NK cells 402-16 = NF1_Any_Mutants 0 
B lineage | 1.0e-16 | | RAS_Hotspot_Mutants 
Monocytic lineage l im i | | 23e-17 Bi Tile wr al 
Myeloid dendritic cells ‘a | hi 23e-17 a a 2 
. : Disease Site 
Neutrophils i f | | shy, th ne ~ Lymph Node 
Endothelial cells | | it 20¢5 = other 


b c 
1.00 
* B-lineage-low 
> eS  B-lineage-high 
= = 0.75: 
sg 8 
2 e 
a a 
g g 0.50: 
3 a 
g £0.25 
oS 
36 ——* 6 
Adj. p-value (A vs. C) = 0.0068 p= 0.053 
0} Adj. p-value (B vs. C) = 0.4 0 : 
0 25 50 75 100 125 
Number at risk Time in months 0 25 50 7 100 125 
A 36 9 4 1 i) 0 Number at risk Time in months 
B 58 26 1 5 4 4 B-lineage-low 64 23 8 2 0 t) 
Cc 33 16 5 4 2 O B-lineage-high 63 28 12 8 6 1 
d 
Ic q-value 
T cells 2.1e-57 
CD8 T cells 42e45 IC Gene/Metagene 
' | A Z-score 
Cytotoxic lymphocytes 6.5e-62 a 2 
B 
NK cells 3.6e-35 im c 1 
B lineage 5.3e-34 6 
Monocytic lineage 2.5e-36 
A 
Myeloid dendritic cells 9.5e-30 =| 
-2 
Neutrophils 2.0e-15 
Endothelial cells 1.1e-19 
e f 
1.00. 
iS * B-lineage-low 
i B-lineage-high 
2 075. 2B 0.75. 
3S a 
3 o 
3 3 
£ 8 
a a 
os 0.50. 7 0.50. 
2 2 
2 ral 
a a 
= 0.25. = 
= 3 0.25. 
o o 
é 6 p=0.24 
a 0 
0 50 100 150 
Number atrisk Time in'months Number at risk : Tie in inte ee 
A 289 110 21 1 
B 158 58 11 i+) B-lineage-low 263 100 20 1 
c 79 38 8 i) B-lineage-high 263 106 20 0 


Extended Data Fig. 4 | Immune infiltrate is prognostic of improved disease- d, Unsupervised hierarchical analysis of TCGA KIRC RNA-seq data using MCP- 
specific survival in TCGA cutaneous melanoma cohort but not the clear-cell counter scores identifies three immune classes with differential presence of 
RCC cohort. a, Unsupervised hierarchical analysis of TCGA SKCM RNA-seq individual cell types as indicated. Numbers of patientsin each class are shown 
data using MCP-counter scores identifies three MICs with differential presence at topof plot. Pvalues determined by two-sided Kruskal-Wallis rank-sum test q 
of individual cell types as indicated. Numbers of patients in each class is shown value calculated by FDR. e, Kaplan-Meier estimates of overall survival 


ontop of the plot. Pvalue determined by two-sided Kruskal-Wallis rank-sum probability of immune class groups. f, Kaplan-Meier estimates of overall 
test and q value calculated by FDR. b, Kaplan-Meier estimates of overall survival probability by B cell lineage scores shown by high and low groups 
survival of MIC groups. c, Kaplan-Meier estimates of overall survival by B cell dichotomized by median values. For both, overall survival was defined as the 
lineage scores shown by high and low groups dichotomized by median values. time interval from date of accession for each sample to date of death or 
Overall survival was defined as the time interval from date of accession for censoring from any cause. Forb,c,e, f, patient numbers are included inthe 


each sample to date of death or censoring from any cause (Methods). table below the graph and Pvalues were calculated by log-rank test. 


Article 


BB syto13 
i MecA79 


 cp20 
Micps 


Patient 2 
LN 
On Treatment 


Patient 4 
LN 
On Treatment 


300 uM 


a anti-CD21 anti-CD23 
b 
anti-CD20 anti-CD21 anti-CD23 
faery ts ree Arh " 
ic d 


B-cell Lineage Score MS4A1 Expression 92 
8 
8 
p=0.74 p=0.85 51 
a 
p=0.24 a 0. 
= 250 = _ Arm 3. . 
oii p=0.016 20.031 BE Nivo & 1 
& © Ipi+Nivo =-2. 
5 150 . 


So 
S 


MS4A1 Expression (FPKM 
a 
3 


On-Treatment 


Baseline 


On-Treatment 


Baseline 


MCP Counter B Lineage 


Extended Data Fig. 5 | TLSs found in nodal and non-nodal metastases are 
consistent with mature secondary follicular-like TLSs with modest 
correlation with gene expression data. a, Representative TLSs in tumours 
from patients with melanoma treated with neoadjuvant ICB demonstrating 
maturation status as indicated by the presence of follicular dendritic cells 
(CD21) and germinal centre B cells (CD23). We also include multiplex 
immunohistochemistry for SYTO13, MECA79, CD20 and CD4 (with magnified 
view of individual TLSs indicated by white box on the right). Circles denote 
defined TLSs based on multiplex immunohistochemistry. Black line 
approximates tumour border. b, Representative TLSs from non-lymph node 
metastases on additional patients with metastatic melanoma indicated by H&E 
staining, as well as singlet staining for CD20, CD21 and CD23. Black line on H&E 
image denotes tumour border. c, Comparison of baseline and on-treatment 
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gene expression with MCP-counter analyses for B cell lineage as well as MS4A1 
expression by RNA-seq for patients with high-risk resectable melanoma 
treated withICB as part of clinical trial (2 =11NR and 10R for baseline and n=11 
NRand 9R for on-treatment). Response and treatment armas indicated. 
Medians with interquartile range are shown. Pvalues were determined by two- 
sided Mann-Whitney U-test. d, Linear regression modelling of MCP-counter 
values for B cell lineage with regards to CD20 counts (n=10 NRand7R) and TLS 
density (n=10 NRand 6R) as indicated. e, Linear regression modelling of 
MS4A1 gene expression with regards to CD20 counts (n=10 NRand7R) and TLS 
density (n=10 NRand 6R) as indicated. These represent on-treatment time 
points. For d,e,r, values calculated by linear regression and Pvalues for non- 
zero slopeas calculated by Prism v.8.0.0. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | TLSs are associated with response in RCC similar to 
those observed in melanoma. a, Multiplex immunohistochemistry images 
from three additional patients with advanced melanomatreated with 
neoadjuvant ICB. Staining as indicated and similar to Fig. 2.b, Quantification of 
CD20 cells by singlet immunohistochemistry and association with response to 
neoadjuvant ICB in metastatic RCC, with responders defined as having partial 
response and non-responders as having progressive disease by RECIST 1.1 
(n=10 PD and 8 PRat baseline andn=5 PD and11PRontreatment).c,d, Density 
of TLSs (n=10 PD and 9 PRat baseline andn=5 PD and 9 PRon treatment) (c) 


and ratio of tumour area occupied by TLSs (n=10 PD and 7 PRat baseline and 
n=5PDand11PRontreatment) (d) and correlation by treatment response. Bars 
indicate median values and interquartile ranges are shown. Pvalues were 
determined by two-sided Mann-Whitney U-test. e-g, Representative image of 
CD20 staining in responder with TLSs, associated H&E slide, singlet stains and 
characterization by multiplex immunofluorescence of TLSs. h, Multiplex 
immunohistochemistry images from three additional patients with RCC 
treated with pre-surgical ICB. Staining as indicated and similar tog. 
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Extended Data Fig. 7 | TLSs are associated with markers of T cell activation 
and response and B cell proliferation. NanoString GeoMx Digital Spatial 
Profiling was used to perform high-plex proteomic analysis with spatial 
resolution. a, Example of selection of ROIs (200 pm x 200 pm) from 
representative patients with melanoma treated with neoadjuvant ICB 
including ROI containing TLSs and ROIs outside the context of a TLS (non-TLS). 
ROI selection was completed using H&E staining and confirmed with 
immunofluorescence as shown using S100B and PMEL, SYTO13, CD3 and CD20. 
Masking for B cells and T cells as indicated based on CD3 and CD20 staining. 

b, Fold change (log,-transformation) in expression of various markers of T cell 
activation and response in TLS-associated T cells as compared to T cells found 
outside the TLS per individual slide. Data show individual TLS ROI values 
divided by the average non-TLS value of that slide. Increased expression inthe 
context of TLSsis represented by shaded pink box (+0). c, Average log,- 


transformed fold change of expression for TLS-associated T cells as compared 
tonon-associated T cells. Individual dots represent individual patients/slides 
as indicated. Data show the average log,-transformed count per TLS ROI value 
minus the average log,-transformed count per non-TLS ROI value per slide for 
each protein queried. For bandc, increased expression in the context of TLSs is 
represented by shaded pink box (>0). Median and interquartile range are 
indicated. Error bars indicate 95% confidence intervals. d, Levels of Ki67 
protein expression in B cell masks of non-TLSs and TLS ROIs by individual 
patient as indicated. Counts are represented as signal-to-noise ratios of Ki67 
compared to geometric means of isotype controls. Median and interquartile 
range are indicated. Error bars indicated 95% confidence ratios, and Pvalues 
were determined by Student’s t-test. For a-d, the number of ROIs analysed for 
each patient are 11 for patient 1, 12 for patient 2, 12 for patient 10, 7 for patient 17 
and 7 for patient 19. 
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Extended Data Fig. 8| BCR analyses of intratumoral B cellsin patients with 


Clonal Count (normalized) 


Patient 2221115 7 1612914 3 204 6 2310 1 13 


Clonal Proportion 


Patient 8 1221141116718 9 5 3 


Clonal Proportion 


Patient 121421 8 167 11189 3 5 


1.00: 


° 
N 
a 


tel 
a 
i=} 


o 
nN 
a 


0. 


0.7: 


0.51 


0.2! 


f=} 


Top 5 Clonotypes 
Summed Clonal Expression 


BCR Repertoire Diversity 


a 


1.00 r 


f=} 


a 


NR R 


Baseline 


IgH 
a Winenail 


6 201013154 2 23191 


NR R 


615 201023134191 2 


© 
f=} 
f=} 


600 


300 


NR R 


Baseline 


Baseline 


IgH IgL 
p = 0.0008 


advanced melanoma before and after treatment with neoadjuvant ICB. 
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baseline samples further evaluated in Fig. 3a and on-treatment samples ina. 
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Patients are grouped as responders and non-responders and identified as 
indicated in which each bar represents individual patient. c,d, Summed 
expression of top five clonotypes in normalized read counts (c) and BCR 
repertoire diversity (d) for responders and non-responders for bothIgH and 
IgL at baseline (n=11NRand10R for IgH and IgL) and on-treatment (n=10 NR 
and 9R for gH and n=11NRand 9R for IgL). Box plot shows median and 
interquartile range. Pvalues determined by two-sided Mann-Whitney U-test. 
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Extended Data Fig. 9 | Single-cell RNA-seq analysis reveals unique clusters of 
Bcells associated with response to ICB. a, Scatter plots comparing the 
percentage of CD45" cells staining positive for CD19" (B cells) as indicated 
between responder (n =17) and non-responder (n=31) samples with all time 
points combined or stratified by pre- and post-treatment as indicated. Data are 
median and interquartile range. Pvalues were determined by two-sided Mann- 
Whitney U-test. b, Heat map displaying scaled expression values (log,(TPM +1) 
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of discriminative genes from all B cells between responder (blue) and non- 
responder (red) samples. Top marker genes are shown for each group. c, Heat 
map showing scaled expression values (log,(TPM +1)) of discriminative gene 
sets per cluster as defined in Fig. 3c. A list of representative genes is shown per 
cluster next to the left margin. For both heat maps, colour scheme is based on 
z-scores from —2.5 (blue) to 2.5 (yellow). 
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Extended Data Fig. 10 |Mass cytometry reveals significant differencesin 
Bcell populations between responder and non-responder tumours. a, Pie 
charts representing composition of individual tumour and peripheral blood 
samples for patients with melanoma treated with ICB used in all analyses for 
mass cytometry. Matched patient samples are located directly beneath one 
another. Samples from patients with lymph node or non-lymph-node 
metastases as indicated. Cell types as indicated. Asterisk indicates samples 
includedin¢-SNE plots and pie chartsinc, Fig. 3d-f, and phenographsin 
Extended Data Fig. 11. b, Scatter plots demonstrating quantification of 
different peripheral blood and intratumoral B cell phenotypes. Median and 
interquartile range are shown. All samples are represented in b (for tumour, 


n=7Rand3NRand, for peripheral blood, n=4 Rand 4 NR). Pvalues were 
determined by one-sided Mann-Whitney U-test. c, ‘SNE plots demonstrating 
intratumoral B cell phenotypes from the neoadjuvant ICB trial in patients with 
advanced melanoma grouped by response and including further breakdown of 
memory cell subtypes and germinal centre B cells. Plots represent combined 
analyses of tumours ran simultaneously with the peripheral blood samples 
(n=5Rand3 NR) and include baseline and on-treatment samples as detailed in 
Supplementary 31. d, Quantification of B cell subtypes in tumour from mass 
cytometric analyses in responders and non-responder from all tumours (n=7R 
and 3 NR). Median and interquartile range are shown. Pvalues were determined 
by one-sided Mann-Whitney U-test. 
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Extended Data Fig. 11| Surface expression of markers analysed by mass 
cytometry. a, Individual phenographs for surface expression of each marker 
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blood samples from patients with melanoma treated with ICB ran together 
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surface markers indicated. c, Percentage of CD45*CD19* cells in tumour by 
response—responder versus non-responder-that are positive for each of the 
surface markers indicated. For bandc, all samples are represented (for tumour, 
n=7Rand3NRand, for peripheral blood, n=4Randn=4 NR). Error bars 
indicate median and interquartile range. Pvalues were determined by two- 
sided Mann-Whitney U-test. 
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Methodology 


Sample preparation Peripheral blood mononuclear cells (PBMCs) and tumor cells were harvested and washed twice with wash buffer (0.5% bovine 
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Soft-tissue sarcomas represent a heterogeneous group of cancer, with more than 50 
histological subtypes’”. The clinical presentation of patients with different subtypes 
is often atypical, and responses to therapies such as immune checkpoint blockade 
vary widely**. To explain this clinical variability, here we study gene expression 
profiles in 608 tumours across subtypes of soft-tissue sarcoma. We establish an 
immune-based classification on the basis of the composition of the tumour 
microenvironment and identify five distinct phenotypes: immune-low (A and B), 
immune-high (D and E), and highly vascularized (C) groups. In situ analysis of an 
independent validation cohort shows that class E was characterized by the presence 
of tertiary lymphoid structures that contain T cells and follicular dendritic cells and 
are particularly rich in B cells. B cells are the strongest prognostic factor even inthe 


context of high or low CD8’* T cells and cytotoxic contents. The class-E group 
demonstrated improved survival and a high response rate to PD1 blockade with 
pembrolizumab ina phase 2 clinical trial. Together, this work confirms the immune 
subtypes in patients with soft-tissue sarcoma, and unravels the potential of B-cell- 
rich tertiary lymphoid structures to guide clinical decision-making and treatments, 
which could have broader applications in other diseases. 


Soft-tissue sarcomas (STSs) comprise many histological subtypes with 
distinct clinical and biological behaviours. Genetically ‘simple’ STSs are 
characterized by translocations that result in fusion proteins and few, 
if any, other genomic lesions, whereas ‘complex’ STSs have an unbal- 
anced karyotype and several genomic aberrations’. STSs are considered 
‘non-immunogenic’ with a low mutational burden”. Among complex 
tumours, undifferentiated pleomorphic sarcoma (UPS), dedifferenti- 
ated liposarcoma (DDLPS) and—to a lesser extent—leiomyosarcoma 
(LMS) can exhibit durable responses to immune-checkpoint blockade, 
whereas simple tumours do not respond to PD1 monotherapy ora 
combination of anti-PD1 and anti-CTLA4 antibodies**. Few reports 
investigating the composition of the tumour microenvironment (TME) 
in different STS histologies have been published®’, but a recent study 


from The Cancer Genome Atlas (TCGA) consortium suggested an asso- 
ciation with prognosis®. 

Here, we developed a new classification of STS, based on the compo- 
sition of the TME in large cohorts of STS, using the microenvironment 
cell populations (MCP)-counter method’. We found that the B lineage 
signature—a hallmark of animmune-high class we called E—correlated 
with an improved survival of patients with STS, in tumours with both 
high or low infiltration of CD8* T cells. In an independent cohort, we 
used immunohistochemistry to validate the high density of B cells and 
presence of tertiary lymphoid structures (TLS) in class E. Finally, we 
showed that class E exhibited the highest response rate to PD1 blockade 
therapy and improved progression-free survival ina multicentre phase 
2 clinical trial of pembrolizumab in STS (SARCO28)*”. 
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Fig. 1| The SICs exhibit strongly different TMEs. This figure refers to the TCGA 
SARC cohort (n=213).a, Composition of the TCGA SARC cohort by SIC, and 
histology. b, Composition of the TME by SIC as defined by the MCP-counter Z- 
scores. NK cells, natural killer cells. c, Expression of gene signatures related to 
the functional orientation of the immune TME by SIC. d, Expression of genes 
related to immune checkpoints by SIC. Adjusted Pvalues are obtained from 
Benjamini-Hochberg correction of two-sided Kruskal-Wallis tests Pvalues. 


Immune classification of STS 


The TME compositions from four independent discovery primary STS 
datasets (TCGA SARC, Gene Expression Omnibus (GEO) accessions 
GSE21050, GSE21122 and GSE30929) (Extended Data Table 1) with pub- 
licly available gene expression profiles were analysed by MCP-counter, 
a gene-expression-based TME deconvolution tool’. An immune-based 
classification of STS was developed from this analysis (Extended Data 
Fig. 1, Methods) andtumours were assigned to one of five sarcomaimmune 
classes (SICs), labelled A,B, C, DandE, with highly distinct profiles (Fig. 1). 


Wecompared the SIC distribution across histological subtypes and found 
that most LMS tumours were classified to SICs A and B (Fig. 1a). DDLPS 
accounted for half of SIC C tumours. Tumours classified as SICs D and E 
were more evenly distributed across histological subtypes. Application 
of the predictor of the immune classes (Methods) to other STS histologies 
from French Sarcoma Group (FSG) cohort (Extended Data Table 1) revealed 
that all SICs could be identified in each histology (Extended Data Fig. 2a). 

The TME composition differs significantly between SICs (Fig. 1b). 
Three SICs showed homogeneous profiles. SIC A, ‘immune desert’, 
was characterized by the lowest expression of gene signatures related 
to immune cells, as well as low vasculature. SIC C, ‘vascularized’, was 
dominated bya high expression of endothelial-cell-related genes. SIC 
E, ‘immune and TLS high’, was characterized by the highest expres- 
sion of genes specific to immune populations such as T cells, CD8* 
T cells, natural killer cells and cytotoxic lymphocytes. Notably, a key 
determinant of SIC E was the high expression of the B lineage signature 
(P=1.8 x10’). SICs B and D were characterized by heterogeneous but 
generally immune-low and immune-high profiles, respectively. 

The expression of genes associated with T cell or myeloid cell chemo- 
taxis, T cell activation and survival, major histocompatibility complex 
class I, and regulatory gene signatures was high in SICs D and E, inter- 
mediate in SICs B and C, and very low in SIC A (Fig. 1c). Expression of 
the lymphoid-structures-associated B-cell-specific chemokine CXCL13 
was notably high in E tumours, moderate in D tumours, generally low 
in Band Ctumours, and negligible in A tumours. 

The expression of immune-checkpoint-related genes (Fig. 1d) fol- 
lowed that of immune infiltrates, with high expression of the genes 
encoding PD1, PDL2, CTLA4 and TIM3 (PDCD1, PDCDILG2, CTLA4 and 
HAVCR2, respectively) in SIC E followed by SIC D tumours, and low-to- 
very-low expression in SIC C, Band A tumours. CD274 (which encodes 
PDL1) was heterogeneously expressed across SICs, whereas LAG3 was 
expressed at high levels only in SIC Etumours, and its expression was 
low in all other classes. The above findings were consistent across the 
four discovery cohorts (Extended Data Fig. 3). 
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Fig.3|TLSs area distinguishing feature of the immune-high class of STS. 
This figure refers tothe NTUH cohort (n= 93). a, Populational characterization 
of TLSs. Left, examples of two tertiary lymphoid structures by 
immunohistochemistry, identified as CD3* T cell (blue) aggregates containing 
DC-LAMP* mature dendritic cells (red, red arrows) and juxtaposing CD20* B cell 
aggregates (brown). Right, representative immunofluorescence staining of a 
TLS for CD3 (magenta), CD20 (green) and PD1 (cyan). DAPI staining is shownin 
blue. The multispectral image shows CD3‘PDI* double-positive cells (yellow 
arrows). b, Functionality of TLSs. Left, CXCR5* (magenta), CD4* (yellow) and 
PDI‘ (green) cellsin zones 1and 2 of the same TLS. Multispectral fluorescence 
images of zones 1and 2 show CXCRS*CD4*PDI' triple positive cells (red arrows) 
characteristic of T follicular helper cells. Right, CD20* cells stained in pink (left) 
onconsecutive sections ofa TLS. CD23 (green on left) and CD21 (brownon 


SICs are associated with patient survival 

After confirmation that the two cohorts with available survival data(TCGA 
SARC, n=213; GSE21050,n=283) exhibited similar survival patterns (data 
notshown), thecohorts were pooled to study the clinical outcome of the five 
SICs (Fig. 2a). Patients with SIC A exhibited the shortest overall survival com- 
pared with group Dor E patients (P=0.048 and P=0.025, respectively). Simi- 
larly, among the other STS histologies from the FSG cohort, patients with 
SICAhadashorter overall survival than patients with SIC E (Extended Data 
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right) positive cells with reticular morphology characteristic of follicular 
dendritic cells (yellow arrow, zone 3). PNAd* structures (brown, green arrow) 
with high endothelial venule morphology are also detectable nearby (zone 4). 
c, Number of TLS among 5 SICs of 73 tumours of NTUH cohort (n=73).d, 
Characterization of the immune infiltrate in tumours according to TLS 
presence (TLS n=82, TLS‘ n=11, total n=93). Densities of CD3* (left), CD8* 
(centre) and CD20* (right) cells in tumours lacking or containing TLSs; 
densities including (total) or excluding (excl) TLS are indicated for the TLS* 
tumours. Box plots represent median (larger bar) and interquartile range (IQR). 
Upper whisker extends to whichever is minimal, maximum or third quartile 
plus 1.5x IQR. Lower whisker extends to whichever is maximal, minimum or first 
quartile minus 1.5x IQR. Pvalues were determined by chi-squared test (c) or 
two-sided Mann-Whitney tests (d). 


Fig. 2b). Ina multivariate model with classical prognostic factors (Fig. 2b), 
SICs were found to be significantly associated with prognosis, independent 
of other clinical parameters (as compared with SIC A; P=0.01Land P=0.029, 
for SICs D and E, respectively). Tumours were separated between high 
and low expression of CD8* T cells, cytotoxic lymphocytes and B lineage 
signatures based on the observation of the MCP-counter scores distribu- 
tion (Extended Data Fig. 4). Detailed analysis of the effect of these immune 
cell population signatures revealed that whereas neither CD8* T cells 
(P=0.277) (Fig. 2c) nor cytotoxic lymphocytes (P=0.0513) (Extended Data 
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Fig. 4| SICs are strongly associated with STS response to PD1 blockade 
therapy. This figure refers to the SARCO28 cohort (n=47).a, Relationship 
between SIC, histology and response to treatment in the SARCO28 cohort. b, 
Waterfall plot showing the best response to pembrolizumab as a percentage 
change in the size of target lesions from baseline (n=45). Tumour sizes were 
calculated as the sum of target lesion diameters. Colours indicate the SIC to 
which each tumour was assigned. Dashed lines indicate +20%, -30% and -100% 
change from baseline levels. SIC E versus other comparison was performed 
using atwo-sided Mann-Whitney test. CR, complete response; PD, progressive 
disease; PR, partial response; SD, stable disease; SS, synovial sarcoma. c, 
Progression-free survival of patients by tumour SIC (n= 47). 


Fig. 5a) significantly correlated with survival, the B lineage signature was sig- 
nificantly associated with improved overall survival (P=4.25 x10“) (Fig. 2d). 
When analysed in the context of high or low infiltration of CD8* T cells 
(Fig. 2e), cytotoxic lymphocytes or the expression of PDCD1 (PD1), CD274 
(PDL1) or FOXP3 (Extended Data Fig. Sb-e), the Blineage signature was the 
dominant parameter for improved survival, regardless of the expression 
of other immune factors. In addition, SIC Etumours demonstrate high 
expression of both /G/ (also known as/CHAIN) and TNFRSF17 (encoding 
BCMA) (datanot shown), which indicates that plasma cells” may contribute 
to improved prognosis. 


Mutational landscape of SICs in TCGA SARC 


The overall tumour mutational burden was low across the studied 
cohorts (median: 32 non-synonymous mutations) and appeared to 


be similar across all SICs (Extended Data Fig. 6a). However, a few highly 
mutated tumours (each with more than 250 non-synonymous muta- 
tions) were found inthe D and E groups. Qualitative mutational analysis 
revealed several commonly mutated genes across the cohort, including 
TP53 (35.2%), ATRX (16.0%), TTN (9.9%), RB1 (8.9%), MUCI6 (8.0%), PCLO 
(6.1%), DNAHS, MUCI7 and USH2A (5.2% each) (Extended Data Fig. 6b). 
TP53 was more frequently mutated among SICs D and E tumours 
(P=0.01) (Extended Data Fig. 6c). 

The landscape of copy-number variations, assessed on the TCGA 
SARC cohort, revealed differences between histologies, consistent with 
previous observations’. However, there was no notable difference in 
copy-number variation between SICs (data not shown). 


In situ validation of SIC profiles in tumours 
To validate the TME profiles of SICs in situ, we analysed an independent 
cohort of 93 STS cases (NTUH cohort) (Extended Data Table 1). Seventy- 
three samples passed quality control for transcriptomic analysis using 
Nanostring nCounter technology. We classified this cohort into the same 
five SICs (Methods) with the following distribution: A, 16 (21.9%); B, 19 
(26.0%); C, 10 (13.7%); D, 17 (23.3%); and E, 11 (15.1%). The NTUH cohort 
samples exhibited gene-expression-based TME profiles that were similar 
to that of TCGA SARC and GSE21050 cohorts (Extended Data Fig. 7a). 
By quantitative immunohistochemistry, immune-desert SIC A was 
characterized by very low densities of CD3*, CD8* or CD20* cells, 
whereas immune-and-TLS-high SIC E exhibited high densities of 
these cells (pairwise comparison, P= 4.01 x 10°, P= 6.64 x 10° and 
P=9.90x107, respectively). The vascularized SIC C exhibited a moder- 
ate infiltration by immune cells and a high density of CD34" endothelial 
cells (Extended Data Fig. 7b, c). 


TLSs area feature of SIC E tumours 

The CXCL13 chemokine, whichis associated with the presence of TLSs”, 
was strongly expressed in SIC Etumours (Fig. 1c, Extended Data Fig. 2c). 
Expression of CXCL13 was highly correlated with that of the TLS-associ- 
ated 12-chemokine signature” (Extended Data Fig. 8a), which suggests 
that TLSs could bea marker of SIC E. TLSs were defined as a CD20* B-cell 
follicle juxtaposed to a CD3’ T cell aggregate containing at least one 
DC-LAMP* (also known as LAMP3*) mature dendritic cell’"** (Fig. 3a, 
left). A strong association between SICs and the presence of TLSs was 
identified (P=3.13 x 10°) (Fig. 3c). No TLSs were observed in tumours 
from SICs A, C and D, and only one tumour from SIC B had one TLS. 
By contrast, nine out of eleven (82%) SIC E tumours exhibited one or 
more TLS. All TLSs were intratumoural (Extended Data Fig. 8b), and 
found at the periphery and inthe centre of the tumour in all histologies 
(Extended Data Fig. 8c, d). 

We observed the presence of CD3*PDI' T cells (Fig. 3a, right) in the 
germinal centre of TLSs with characteristics of follicular T helper 
cells’”"8 (positive for CD4, PD1 and the CXCL13 receptor CXCR35) (Fig. 
3b, left), CD23*CD21' cells with reticular morphology characteristic of 
follicular dendritic cells, and peripheral node addressin (PNAd)-positive 
structures with high endothelial venules morphology (Fig. 3b, right). 
Germinal centres are a hallmark of secondary follicle-like TLSs (SFL- 
TLS), the final maturation step of TLS; the earlier steps being early TLSs 
(E-TLS) and primary follicle-like TLSs (PFL-TLS)*"°. E-TLS, PFL-TLS and 
SFL-TLS represented 60.5%, 21.1% and 18.3%, respectively, of all TLSs 
analysed (Extended Data Fig. 8e, f). This differed between histologies 
(P=7.76 x 10°), with UPS having only 16.7% of E-TLS. 

Tumours with TLSs (11.8%, 11 out of 93) had significantly higher den- 
sities of tumour-infiltrating CD3* T cells (P= 4.0 x 10°), CD8* T cells 
(P=1.8 x10“) and CD20*B cells (P=1.5 x 10>) (Fig. 3d). This association 
persisted even if T and B cells within TLSs were excluded from the analy- 
sis (P=1.5 x10“, P=3.8 x 10 * and P=7.9 x 10”, respectively) (Fig. 3d), 
which suggests that high immune cell infiltration is not limited to TLSs. 
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SICs predict patient response to PD1 blockade 


We examined whether SICs can predict the patient response to check- 
point blockade therapy. We obtained 47 pre-treatment STS metas- 
tasis biopsies from patients enrolled in the SARCO28 clinical trial* 
and its expansion cohort" (Extended Data Table 1), which evaluated 
the efficacy of the anti-PD1 monoclonal antibody pembrolizumab in 
patients with metastatic STS. Of these 47 patients, 1 achieved a complete 
response, 9 a partial response, 17 stable disease and 20 had progressive 
disease (Fig. 4a). Pre-treatment tumours were classified into SICs based 
on gene-expression data. The objective response rate (ORR) (which 
accounts for complete and partial responses) as evaluated by response 
evaluation criteria in solid tumours (RECIST) criteria was 21.2% in the 
overall cohort. SICs, however, showed substantial variation in ORR, 
with SIC E patients exhibiting the highest ORR (50%, 5 out of 10), fol- 
lowed by SIC D (25%, 3 out of 12) and SIC C (22%, 2 out of 9) (Fig. 4a). A 
complete response was found only in SIC E, as well as one patient who 
had a 100% change in target lesions but anon-complete response in 
non-target lesions and thus did not qualify for a complete response. 
Notably, there were no responders within the SIC A (0 out of 5) andB 
(0 out of 11) groups (Fig. 4a). Overall, SIC E tumours were associated 
with the highest response rate to pembrolizumab in comparison with 
tumours from other SICs (P= 0.026, Fig. 4b). Patients with SIC Etumours 
also exhibited improved progression-free survival compared with 
patients with SIC A or B tumours (P= 0.023 and P= 0.0069, respec- 
tively) (Fig. 4c). 


Discussion 


This study is, to our knowledge, the most comprehensive analysis of 
the STS immune TME and the first to evaluate the prognostic effect of 
immuneinfiltrates by simultaneously integrating several immune cell 
populations and malignant cell characteristics. Previous studies have 
examined the immune profile of STS tumours, but the importance of B 
cells and TLSs was not investigated. The clinical effect of CD8* T cells 
and PD1 expression has yielded controversial results’*”” *°. Here, we 
found the CD8* T cell signature and PD1 were expressed in class D and 
ESICs, which are associated with favourable outcomes, providing high 
infiltration of B cells. The integrative analysis demonstrates that infil- 
tration by B cells is a key discriminative feature of a group of patients 
with improved survival. This B-cell-high group was found to respond 
better to PD1 blockade therapy, although this should be validated on 
alarger cohort. 

The field of immuno-oncology is rapidly expanding, and is crucial 
to accurately identify patients who are likely to respond. Here, we 
propose a classification for STS that is immune-centric with prog- 
nostic effect. It defines a group of patients with a better response to 
anti-PD1 therapy marked by B cells and TLSs. This finding may have 
broad applications. Sarcomas are considered immune-quiescent 
tumours, with alow mutational burden. Nevertheless, our datashow 
that some STSs are immunogenic and that this is driven by B cells. 
Further work is needed to extend these findings to all STS histologies 
and other cancers. Similarly, the underlying mechanisms require 
further investigation, but a possible explanation is that TLSs are sites 
at which anti-tumoral immunity is generated, with B cells instructing 
T cells—in particular CD8* T cells—to recognize tumour-associated 
antigens”. It is noteworthy that TLS-rich tumours are more infiltrated 
by CD8*T cells. These T cells can become exhausted, explaining the 
correlation of the expression of immune checkpoints (such as PD1 
and LAG3) with TLSs, and why treatment with checkpoint inhibitors 
may allow productive anti-tumour immunity in TLS-rich tumours. 
Overall, our findings lay the foundation for a tool to risk-stratify 
patients with STS and identify those who may be more likely to ben- 
efit from immunotherapies, and may be broadly applicable to other 
malignancies”**°. 
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Methods 


Ethics and patients 

Patients diagnosed with DDLPS, LMS and UPS were identified and 
the pathology diagnosis was confirmed by a certified pathologist in 
National Taiwan University Hospital. The research was approved by 
the Research Ethics Committee of NTUH (201605061RINA) for this 
retrospective study. Formalin-fixed paraffin-embedded (FFPE) blocks 
were retrieved and 4—5-m-thick slides were taken for immunohisto- 
chemistry staining and RNA extraction for Nanostring testing. Other 
cohorts were previously published***!*, 


Establishing the immune classification of STS 

To establish a robust immune classification of STS, publicly available 
transcriptomic data from TCGA data portal and the GEO repository 
representing four large and independent patient cohorts were included. 
Only tumours fromthe most common histologies of genomically com- 
plex STS were included: LMS, UPS and DDLPS. We analysed data from 
the TCGA SARC8 (n= 213), GSE21050*! (n=283), GSE21122” (n=72) and 
GSE30929” (n =40) cohorts. 


Public transcriptomic data pre-processing 

Transcriptomic data were downloaded fromthe TCGA data portal (SARC 
cohort) and GEO (accessions GSE21050, GSE21122 and GSE30929). 
TCGA SARC was restricted to complex genomics sarcomas (UPS, DDLPS 
and LMS). Normalized TCGA SARC RNA-sequencing data were log,- 
transformed. Microarray data were normalized using frozen-RMA 
method* from the R package frma. Batch effect was corrected across 
series using ComBat”, with histology as covariate. 


Estimation of the TME composition 

The TME composition of each tumour was assessed with the MCP- 
counter tool’, which provides abundance scores for eight immune 
(T cells, CD8* T cells, cytotoxic lymphocytes, natural killer cells, B cell 
lineage, monocytic lineage, myeloid dendritic cells and neutrophils), 
and two stromal populations (endothelial cells and fibroblasts). The 
scores are based on analysis of transcriptomic markers—that is, tran- 
scriptomic features that are strongly, specifically and stably expressed 
ina unique cell population. These scores are proportional to the abun- 
dance of each cell population in the tumour, therefore allowing inter- 
sample comparison and large cohort analyses**. The MCP-counter 
signatures composition are as follows: T cells: CD28, CD3D, CD3G, CDS, 
CD6, CHRM3-AS2, CTLA4, FLT3LG, ICOS, MAL, PBX4, SIRPG, THEMIS, 
TNFRSF25 and TRATI; CD8*" T cells: CD8B, cytotoxic lymphocytes: CD8A, 
EOMES, FGFBP2, GNLY, KLRC3, KLRC4 and KLRD1;B lineage: BANK1, 
CD19, CD22, CD79A, CR2, FCRL2, IGKC, MS4A1 and PAXS; natural killer 
cells: CD160, KIR2DL1, KIR2DL3, KIR2DL4, KIR3DL1, KIR3DS1, NCR1, 
PTGDR and SH2D1B; monocytic lineage: ADAP2, CSFIR, FPR3, KYNU, 
PLA2G7, RASSF4 and TFEC; myeloid dendritic cells: CDIA, CD1B, CDIE, 
CLECIOA, CLIC2 and WFDC21P; neutrophils: CA4, CEACAM3, CXCR1, 
CXCR2, CYP4F3, FCGR3B, HAL, KCNJ15, MEGF9, SLC25A37, STEAP4, 
TECPR2, TLE3, TNFRSF1OC and VNN3; endothelial cells: ACVRL1, APLN, 
BCL6B, BMP6, BMX, CDHS, CLEC14A, CXorf36 (also known as DIPK2B), 
EDNI, ELTD1, EMCN, ESAM, ESMI1, FAM124B, HECW2, HHIP, KDR, MMRN1, 
MMRN2, MYCT1, PALMD, PEAR1, PGF, PLXNA2, PTPRB, ROBO4, SDPR, 
SHANK3, SHE, TEK, TIE1, VEPH1 and VWF. 


Intracohort immune classifications 

The fibroblasts signature was removed from this analysis as all STS 
tumours exhibited high and homogeneous scores for this cell popu- 
lation, which is consistent with the mesenchymal origin of STS. The 
signature for CD8 T cells was removed from the analysis for GSE21050, 
GSE21122 and GSE30929 as it showed very small variation across all 
samples in these microarray-based cohorts. Unsupervised cluster- 
ing of samples in each cohort was performed based on the metagene 


Z-score for the included populations of MCP-counter (Extended Data 
Fig. 9a—d) using R software, with the Euclidian distance and Ward’s 
linkage criterion, using the gplots package. The TCGA SARC, GSE21050, 
GSE21122 and GSE30929 cohorts were separated into 6, 9, 7 and 6 
groups, respectively. The number of clusters was chosen empirically 
following the dendrograms shown in Extended Data Fig. 9a—d. Analysis 
of the intersample variance revealed that much of the explainable vari- 
ance was already attained at the chosen number of clusters as visualized 
in Extended Data Fig. 9e-h. 


Pan-cohort immune classes 

To aggregate the above four intracohort classifications, the transcrip- 
tome matrix of each cohort was independently zero-centred for each 
gene across allsamples. Then, we computed the centroids of each class 
over the whole transcriptome and analysed the Pearson correlations 
between all the centroids on the set of genes shared across the four 
cohorts (Extended Data Fig. 9i). From these correlations, we deduced 
five SICs. The tumours from six remaining cohort-specific clusters 
shared intermediate/weak correlation patterns to other clusters and 
were temporarily labelled as ‘unclassified’. 


Prediction of the immune classes 

Centroids of SICs were computed on MCP-counter intraseries Z-scores 
for T cells, cytotoxic lymphocytes, B cell lineage, natural killer cells, 
monocytic lineage, myeloid dendritic cells, neutrophils and endothelial 
cells, on all cohorts. To predict de novo the immune classes of each of 
the cohorts, MCP-counter Z-scores were computed, and each sample 
was assigned to the closest immune class based on its Euclidian distance 
to the related centroids. The SICs labels used are the ones predicted 
using this method. Principal component analysis of the 608 samples 
onthe MCP-counter scores shows that the intra-SIC homogeneity was 
improved by this prediction step (Extended Data Fig. 9j, k),as confirmed 
by supervised tests across SICs (Extended Data Fig. 91, m). 


Gene signatures for the functional orientation 

The signatures used to determine the functional orientation of the TME 
were derived from the literature”. The signatures were the following: 
immunosuppression (CXCL12, TGFB1, TGFB3 and LGALS1), T cell activa- 
tion (CXCL9, CXCL10, CXCL16, IFNG and IL15), T cell survival (CD70 and 
CD27), regulatory T cells (FOXP3 and TNFRSF18), major histocompat- 
ibility complex class I (HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G and 
B2M), myeloid cell chemotaxis (CCL2), and tertiary lymphoid structures 
(CXCL13). For each signature, scores were computed as the geometric 
mean signature expression. 


De novo prediction of the immune classes of additional cohorts 
and other platforms 

The predictor described above was adapted to analyse new and inde- 
pendent samples, from Nanostring-analysed FFPE samples. Ina first 
step, SICs were estimated on the NTUH cohort by sorting samples on 
the Blineage signature, T cells signature then endothelial cell signature 
and assigning each sample according to the SIC it resembled the most. 
Similar to as described above, centroids of each SIC on Nanostring data 
MCP-counter scores Z-scores were computed and samples were reas- 
signed to the SIC they were closest to the centroid of. For new samples 
from the SARCO28 cohort, MCP-counter scores for T cells, cytotoxic 
lymphocytes, B lineage and endothelial cells were computed and trans- 
formed as Z-scores. Distances with Nanostring-defined centroids pre- 
sented above were computed with Euclidian metric, and samples were 
assigned to the SIC with the lowest distance. 


RNA extraction from FFPE tumours 

Human FFPE tumour specimens were cut into 3-m-thick sections and 
were reviewed under microscope for tumour histology. Non-tumour 
tissues were excluded and tumour tissues were deparaffinized by 
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deparaffinization solution (Qiagen 19093) and RNA were extracted 
by RNeasy FFPE kit (Qiagen 73504) according to the manufacturer's pro- 
tocol. RNA quality and size distribution were determined by the Agilent 
2100 Bioanalyzer with RNA analysis kits (RNA 6000 nano kit 5067-1511, 
RNA 6000 nano reagent 5067-1512, RNA 6000 nano ladder 5067-1529, 
RNA 6000 pico kit 5067-1513, RNA 6000 pico reagents 5067-1514, RNA 
6000 pico ladder 5067-1535) for cohorts NTUH core and NTUH whole, 
and by the Agilent RNA ScreenTape assay (catalogue: RNA ScreenTape 
5067-5576, RNA ScreenTape sample buffer 5067-5577, RNA ScreenTape 
ladder 5067-5578) and Agilent 2200 TapeStation for cohort SARCO28. 
The samples from SARO28 were separately quality-controlled by the 
sarcoma pathology group at MD Anderson Cancer Center. 


Nanostring nCounter analysis 

The RNA was analysed using the nCounter Technology (Nanostring 
Technologies) as per the manufacturer’s protocol. Data were normal- 
ized using the nSolver software (Nanostring Technologies). 


Enzymatic and fluorescent multiplexed immunohistochemistry 
The FFPE human tumour and control specimens were cut into 
3-pum-thick sections. Human FFPE tonsil sections were used as posi- 
tive controls for CD3, CD4, CD8, CD20, CD21, CD23, CD34, CXCRS, 
DC-LAMP, PD1, PDL1and PNAd, placenta sections were used in addition 
for PDL1 and cerebral cortex tissue was used as a negative control. The 
specificity of all antibodies was tested by the manufacturers and the 
specificity of anti-PD1 antibodies was validated in our laboratory on 
overexpressing cells pellets as previously reported*®. Antigen retrieval 
was Carried out ona PT-link (Dako) using the EnVision FLEX Target 
Retrieval Solutions at High pH (Dako, K8004) or Low pH (Dako, K8005). 
Endogenous peroxidase activity and non-specific Fc receptor binding 
were blocked with H202 3% (Gifrer, 10603051) and Protein Block (Dako, 
X0909) respectively. The primary and secondary antibodies used for 
immunohistochemistry and immunofluorescence are summarized in 
Extended Data Table 2. Immunohistochemistry and immunofluores- 
cence images were independently analysed blindly by three observers 
(L.L., C.S.-F. and G.L.). 


Enzymatic immunohistochemistry 

The stainings were performed with an Autostainer Link 48 (Dako). Chro- 
mogenic detection was performed using 3,3’-diaminobenzidine (Dako, 
K3468) for CD8, CD20, CD21, PDL1 and PNAd; 3-amino-9-ethylcarbazole 
substrate (Vector Laboratories, SK-4200) for DC-LAMP; Blue Alkaline 
Phosphatase Substrate (Vector Laboratories, SK5300) for CD3; High- 
Def red IHC chromogen (AP) (Enzo, ADI-950-140-0030) for CD20; and 
Permanent HRP Green (Zytomed Systems, ZUCO70-100) for CD23 and 
CD34. The nuclei were counterstained with haematoxylin (Dako, $3301). 
After mounting with Glycergel Mounting Medium (Dako, C056330-2) 
or EcoMount (Biocare Medical, EM897L), the slides were scanned with 
aNanozoomer (Hamamatsu). For CD3, CD8, CD20 and DC-LAMP mark- 
ers, the density of positive cells per mm? was quantified with Calopix 
Software (Tribvn). For CD34 marker, the density of positive vessels per 
mm? was quantified with Halo10 software (Indica labs). TLS were identi- 
fied using the registration module to fit one slide on the other (Halo10 
software, Indica labs). Tumours were considered TLS-positive when a 
CD3 aggregate with DC-LAMP staining was found juxtaposing a CD20 
aggregate. Only aggregates with surface above 60,000 pm’, containing 
at least 700 cells and at least 350 CD20* cells were considered. 


Fluorescent multiplexed immunohistochemistry 

For the PD1, CD20 and CD3 3-plex staining, a tyramide system ampli- 
fication (TSA) was used. The stainings were performed with a Leica 
Bond RX. The incubation with TSA reagent was performed after the 
incubation of the horseradish peroxidase (HRP)-conjugated polymer 
and was followed by antibody stripping at 97 °C for 10 min. This pro- 
tocol was repeated for the second and third primary antibodies and 


corresponding polymer incubations. The dilutions used for the TSA are 
1:400 for TSA AF488, 1:800 for TSA AF594 and 1:200 for TSA AF647, as 
per the manufacturer’s recommendations. For the CXCRS, CD4 and PD1 
3-plex staining, we used a conventional fluorescent-dye conjugated sec- 
ondary antibody system performed manually (all secondary antibodies 
were diluted at 1:100). For all the fluorescent stainings, the nuclei were 
stained with DAPI Solution (Thermo Fisher, 62248) at 2 pg ml” for 10 
min. After mounting with ProLongTM Gold Antifade Mountant (Ther- 
mofisher, P36934), the slides were scanned with a Zeiss Axio Scan.Z1. 


Statistical analysis 

All statistical analyses were performed using the R software (v.3.4.4) 
and the packages survival, gplots, dunn.test and FactoMineR. The rela- 
tionship between two categorical variables was estimated with the 
chi-squared test. The relationship between a categorical variable and 
a quantitative variable was estimated with the Mann-Whitney Utest 
(two categories) or the Kruskall—Wallis test (three or more categories). 
All tests were two-sided. In cases with three or more categories, pair- 
wise comparisons were carried out with Dunn tests. The relationship 
between two quantitative variables was estimated with the Pearson 
correlation. When appropriate, P values were corrected for multiple 
hypothesis testing with the Bonferroni or Benjamini-Hochberg meth- 
ods, as specified in the text or figure legends. Survival was analysed 
with Kaplan-Meier estimates and log-rank tests. No statistical methods 
were used to predetermine sample size. The experiments were not 
randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment unless stated otherwise. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Thetranscriptomic datasets analysed in this study can be accessed on the 
GDC Portal (portal.gdc.cancer.gov, cohort TCGA SARC) and the GEO reposi- 
tory under accession numbers GSE21050, GSE21122 and GSE30929. FSG 
cohort data are publicly available from ArrayExpress for gastrointestinal 
stromal tumour with accession code E-MTAB-373, and from the GEO for 
synovial sarcomas with accession number GSE40021. Myxoidliposarcomas 
fromthe FSG cohort are available from the corresponding authors upon 
reasonable request. Immunohistochemistry and gene expression data 
related to the NTUH cohort (Fig. 3, Extended Data Figs. 7, 8) are available 
upon reasonable request to W.H.F. (herve.fridman@crc.jussieu.fr). The 
datathat support the findings related to Fig. 4 are available from SARC but 
restrictions apply to the availability of these data, which were used under 
license for the study. Data are, however, available from H.A.T. (htawbi@ 
mdanderson.org) upon reasonable request and with permission of SARC. 


Code availability 


All code used in this study is available from the corresponding author 
upon reasonable request. 
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Extended Data Fig. 5| Bcell infiltration of STS is the key factor associated 
with overall survival. This figure refers to TCGA SARC and GSE21050 pooled 
cohorts (n=496).a, Overall survival of patients with STS according to MCP- 
counter scores for cytotoxic lymphocytes. b, Overall survival of patients based 
on the infiltration level of their tumours by B lineage cells and cytotoxic 
lymphocytes. c-e, Overall survival of patients based on degree of tumour 
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infiltration by B lineage cells and expression of PDCDI (c), CD274 (d) and FOXP3 
(e). The analyses were performed with the Kaplan-Meier estimates and two- 
sided log-rank tests. Tumours were considered high for expression of PDCD1, 
CD274 and FOXP3 if their expression was above median, and high for B lineage 
and cytotoxic lymphocytes if the MCP-counter score was above the third 
quartile. 


17007 5 =0.11 b 


1,600- 
550-7 35-1 


450-5 


307 
350-7 


250- ° 25 - 


1205 
20 


100-5 
15-7 


Number of non-silent mutations 
fo} 
Mutated tumors (%) 


1075 


p=0.01 


40-7 


307 


p= 0.059 


207 


Mutated tumors (%) 


107 


> 20% 
B>10% 
B>5% 
O>3% 
Bs<3% 


ZNF831 [1 


EOS 
mouQow> 


TP53 


ATRX TIN RB1 MUC16 PCLO 


Extended Data Fig. 6 | The mutational landscape of STS tumours does not 
vary significantly between SICs. This figure refers to the TCGA SARC cohort 
(n=213).a, Mutational burden according to the SIC of the tumours, expressed 
in number of non-silent mutations. P value was computed witha Kruskal-Wallis 
test. Box plots as in Fig. 3d. b, Mutation frequency of all genes that are mutated 
in greater than 2.5% of tumours. c, Mutation frequency for genes that are 
mutated in more than 5% of tumours, according to SICs in the TCGA SARC 
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cohort. The dashed lines indicate the overall mutation frequency. Pvalues were 
obtained through one-sample two-sided ¢-tests, corrected for multiple testing 
with the Bonferroni method. This was applied only to samples that had 
mutations on the considered genes (7P53: n=75; ATRX: n= 34; TTN: n=21; RBI: 
n=19; MUCI6, n=17; PCLO, n=13; DNAHS, MUCI7 and USH2A: n=11, PTEN, n= 6; 
KRAS,n=2; BRAF,n=1). 
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Extended Data Fig. 7 | Validation of SIC profiles by immunohistochemistry. 
This figure refers to the NTUH cohort.a, SIC attribution as defined by gene 
expression using the MCP-counter Z-scores in 73 cases. b, Cell density counts 
showing the differences in TME composition according to SIC identification of 
the 73 cases (SIC A: n=16; SIC C:n =10; SIC E:n=11). Pvalues are determined by 
two-sided Kruskal-Wallis (KW) tests. Pairwise comparisons are derived from 
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the Dunn test. Box plots are as in Fig. 3d. c, Representative images of CD3 
(green), CD20 (pink), CD8 (brown) and CD34 (green) expression by 
immunohistochemistry of SICA, Cand Etumours. The same area of the tumour 
is represented (0.05 mm?) in each image. Similar results were observed onthe 
other tumours from the same SICs (SIC A: n=16; SIC C:n=10; SICE:n=11). 


» 


12-chemokine signature 


SFL-TLS 


PFL-TLS 


E-TLS 


2 4 6 8 10 12 14 


CXCL13 gene expression 


Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Location and maturation of TLSs. a, Pearson 
correlation between the expression of CXCL13 and the 12-chemokine signature 
of TLS in TCGA SARC cohort (n= 213). Samples are coloured according to SICs. 
b, Intratumoural location of TLSs in three different examples from the NTUH 
cohort—DDLPS, UPS and LMS, respectively. TLSs are observed by the presence 
of CD20* Bcells aggregates (brown, surrounded by blue shapes). The red line 
delineates the tumoral zone. Similar findings were observed onthe 11tumours 
with TLS. c, Definition of peripheral, medium and central zones, accounting for 
25%, 25% and 50% of the total tumour area, respectively. d, Distribution of TLSs 
in the various zones. Each bar represents one tumour. The letters above bars 
indicate the SIC of the tumour when the sample passed quality control of 
Nanostring nCounter hybridization. Dots indicate tumours in which SIC could 
not be determined because of RNA quality control. Similar images were 
observed for 66 E-TLS, 23 PFL-TLS and 20 SFL-TLS. e, Illustration of diverse 


degrees of TLS maturation in STS tumours. Consistent with maturation events 
occurring in secondary lymphoid organs, three maturation steps have been 
described for TLS: E-TLS (bottom), PFL-TLS (middle) and SFL-TLS (top), which 
differ in the presence of follicular dendritic cells (FDC) and their markers. E-TLS 
contain aggregates of CD20* Bcells and CD3° T cells without FDC, PFL-TLS 
contain CD21‘ FDC (red dotted zones) and SFL-TLS contain a germinal centre, 
notably visible through the presence of CD21°CD23* follicular dendritic cells 
(yellow dotted zone). DAPI staining is shown in white. DAPI-negative green dots 
correspond to fluorescent erythrocytes. f, Distribution of TLS maturation 
steps ina subset of tumours. Each bar represents one tumour. Differences 
between the number of TLSs observed here and in other figures can be 
explained by use of non-consecutive slides or a different tumour block for 
some samples. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Pan-cohort immune classification. This figure refers to 
the four discovery cohorts: TCGA SARC (n= 213), GSE21050 (n= 283), GSE21122 
(n=72) and GSE30929 (n=40).a-d, Heat map and unsupervised hierarchical 
clustering of the MCP-counter scores describing the tumour 
microenvironment. Each of the population is represented by the Z-scores of the 
signature. a, TCGA SARC. b, GSE21050. c, GSE21122. d, GSE30929. e-h, 
Evolution of the variance explained by the clusters as a function of the number 
of clusters. Red dots indicate the number of clusters that was retained in this 
study. Each graph corresponds to the heat map onits left. i, Heat map of the 


Pearson correlation of centroids from each SIC class of discovery cohorts 
(TCGA SARC, GSE21050, GSE21122 and GSE30929, n = 608), with five immune 
classes and two groups of unclassified samples. j,k, Principal component 
analysis of samples from the four discovery cohorts (n= 608), based on their 
normalized and merged MCP-counter scores.j is coloured according to the 
original classes, kis coloured according to the predicted immune classes, 
showing a heightened homogeneity within each SIC class. 1,m, Composition of 
the TME with classes defined as inj and k for the four discovery cohorts 

(n= 608), expressed in cohort-specific row Z-scores. 


Extended Data Table 1| Clinicopathological composition of the cohorts included in this study 


Cohort 

n 

Age (median, range) 
Gender (n, %) 

Male 

Female 

STS histology (n,%) 
DDLPS 

LMS 

UPS 

Synovial sarcoma 
Myxoid liposarcoma 
GIST 

SIC (n,%) 

A 


B 


TCGA SARC 
213 


63 (33-90) 


98 (46%) 


114 (54%) 


58 (27.2%) 
104 (48.8%) 
51 (23.9%) 
0 (0%) 

0 (0%) 


0 (0%) 


55 (25.8%) 
52 (24.4%) 
35 (16.4%) 
33 (15.5%) 


38 (17.8%) 


GSE21050 
283 


63 (15-92) 


131 (48.7%) 


138 (51.3%) 


62 (21.9%) 
85 (30%) 
136 (48.1%) 
0 (0%) 

0 (0%) 


0 (0%) 


65 (23%) 
78 (27.6%) 
39 (13.9%) 
57 (20.1%) 


4A (15.5%) 


GSE21122 
72 


ND 


ND 


ND 


46 (63.9%) 
26 (36.1%) 
0 (0%) 
0 (0%) 
0 (0%) 


0 (0%) 


14 (19.4%) 
22 (30.6%) 
8 (11.1%) 

21 (29.2%) 


7 (9.7%) 


GSE30929 
40 


ND 


ND 


ND 


40 (100%) 
0 (0%) 
0 (0%) 
0 (0%) 
0 (0%) 


0 (0%) 


8 (20%) 
13 (32.5%) 
6 (15%) 

7 (17.5%) 


6 (15%) 


FSG 
168 


36 (1-83) 


99 (58.9%) 


69 (41.1%) 


0 (0%) 
0 (0%) 
0 (0%) 
58 (34.5%) 
50 (29.8%) 


60 (35.7%) 


37 (22%) 
37 (22%) 

26 (15.5%) 
26 (15.5%) 


42 (25%) 


NTUH 
93 (SIC: 73) 


58 (9-94) 


37 (39.8%) 


56 (60.2%) 


30 (32.3%) 
31 (33.3 %) 
32 (34.4%) 
0 (0%) 
0 (0%) 


0 (0%) 


16 (21.9%) 
19 (26%) 

10 (13.7%) 
17 (23.3%) 


11 (15.1%) 


For cohort GSE21050, sex information could not be retrieved for 14 patients. For cohort NTUH, SIC could be determined for 73 patients only. NA, not available. 


SARC028 
47 


57 (25-83) 


34 (72.3%) 


13 (27.7%) 


19 (40.4%) 
6 (12.8%) 
19 (40.4%) 
3 (6.4%) 

0 (0%) 


0 (0%) 


5 (10.6%) 
11 (23.4%) 
9 (19.1%) 

12 (25.5%) 


10 (21.3%) 
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Extended Data Table 2 | Antibodies used for immunohistochemistry and inmunofluorescence 


. Concentration | Antigen - : 
| aninady | Retrences | species | clone | source (ugiml) ii ta eee 


Immunohistochemistry 


F : Alkaline 
790-4341 Rabbit 2G6V6 Alkaline ane ao Goat Anti-Rabbit Phosphatase Blue 
Substrate 
Mouse EnVision+ System-HRP, Labelled 


EnVision+ System-HRP, Labelled DAB 
Mouse Polymer (Mouse) 
CD20 Mo75S lgG2 Agilent 
arene Polyview Plus AP (anti-mouse) reagent HighDef red IHC 
chromogen (AP) 


Mouse F EnVision+ System-HRP, Labelled 
co21 M0784 1961 Agilent Polymer (Mouse) 
EnVision+ System-HRP, Labelled Permanent HRP 
Co2zs abi6702 Polymer (Rabbit) Green 
CD34 EnVision+ System-HRP, Labelled 
Polymer (Mouse) 
Rat AEC Peroxidase 
DC-Lamp ODX0191 IgG2a Biotin Donkey Anti-Rat IgG (HRP) Substrate 
Cell EnVision+ System-HRP, Labelled 
ital Rabbit ge Signaling Polymer (Rabbit) 
ImmPRESS HRP Anti-Rat (Peroxidase) 


Immunofluorescence 
EnVision+ System-HRP, Labelled Alexa Fluor™ 647 
Polymer (Rabbit) Tyramide Reagent 


a ae EnVision+ System-HRP, Labelled Alexa Fluor™ 594 
Polymer (Mouse: Tyramide Reagent 
cp20 MO7SS yeze Agilent ms ) : = 


CXCRS MAB190-100 pr | sis | rs | os | | pre FITC Rat Anti-Mouse IgG2b 


EnVision+ System-HRP, Labelled Alexa Fluor™ 488 
Mouse Polymer (Mouse) Tyramide Reagent 
IgG2a 
Cy5 Goat Anti-Mouse IgG2a 
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n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
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A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 
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[ ] Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Immunohistochemistry images were analysed with HALO 10 software (IndicaLab). Immunofluorescence data were obtained with 
AxioScan (Zeiss) 


Data analysis Data was analysed with R software (version 3.4.4) and packages gplots, survival and FactoMineR. Custom code was produced in R for the 
analysis. 
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Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
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- Adescription of any restrictions on data availability 


The transcriptomic datasets analysed in this study can be accessed on the GDC Portal (TCGA SARC) and the Gene Expression Omnibus repository (accession 
numbers GSE21050, GSE21122, GSE30929). Immunohistochemistry, gene expression and clinical-related to NTUH cohorts (Fig. 3, Extended Data Figs. 7 and 8) are 
available from the corresponding author on reasonable request. The data that support the findings related to Fig. 4 are available from SARC but restrictions apply to 
the availability of these data, which were used under license for the study. Data are however available upon reasonable request to HAT (HTawbi@mdanderson.org) 
and with permission of SARC. All code used in this study is available from the authors upon reasonable request. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size TCGA SARC: n=213, GSE21050: n=283, GSE21122: n=72, GSE30929: n=40, FSG: n=168, NTUH n=93, SARCO28: n=47. Total: n=916. 
Data exclusions 20 tumours from the NTUH cohort were excluded from gene expression (end SIC) analysis due to low quality of the extracted RNA. 
Replication No replication was done, but validation cohorts were analysed. 


Randomization | Randomization is only relevant to the SARCO28 cohort, which was previously published. 


Blinding All image and data analysis were performed blindly, independently of sample knowledge. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used CD3: 2GV6, Roche ; DC-Lamp: 1010E1.01, Dendritics ; CD20: L26, AGilent ; CD8: C8/144B, Agilent ; CD21 : 1F8, Agilent ; CD23 : 
SP23, Abcam; CD34: Qbend-10, Agilent ; PD-L1: E1L3N, Cell Signaling ; PD-1: EH33, CoStim Pharmaceuticals ; . 


Validation The specificity of anti-CD3, anti-CD4, anti-CD8, anti-CD20, anti-CD21, anti-CD23, anti-CD34, anti-CXCR5 and anti-DC-Lamp 
antibodies, and MECA-79 (PNAd) was validated on FFPE tonsil sections as positive control. For anti-CD20, certified 
manufacturing facilities from the company guarantee full quality control including western blot and studies using COS-1 cells 
transfected with cDNA encoding the CD20 molecule indicate that the antibody labels an intracytoplasmic epitope localized on 
the CD20 molecule. For anti-CD8, certified manufacturing facilities from the company guarantee full quality control including 
western blot and indicate that the antibody recognizes the cd8alpha chain. For anti-CD34, certified manufacturing facilities from 
the company guarantee full quality control. For anti-CD21, certified manufacturing facilities from the company guarantee full 
quality control including western blotting of the immunogen, and that the antibody labels cells or cell lines known to express 
CD21 (Raji, NC 37, tonsil cells), whereas no labeling is observed in the CD21-negative Jurkat cells (T-cell line) and human 
erythrocytes. For anti-CD23, certified manufacturing facilities from the company guarantee full quality control including western 
blotting, IHC on human tonsils and flow cytometry on Raji cells. For anti-CXCR5, certified manusfacturing facilities from the 
company guarantee full quality control using human CXCRS transfectants by flow cytometry and lack of cross reactivity with 
human CXCR2, CXCR3, or CXCR4 transfectants. For PNAd, certified manufacturing facilities from the company guarantee full 
quality control including western blotting, IHC and flow cytometry. For anti-PD-L1, specificity was validated by the company using 
immunohistochemical analysis of paraffin-embedded human placenta using PD-L1 (E1L3N®) XP® Rabbit mAb in the presence of 
control peptide or antigen-specific peptide. Specificity was verified by using FPE sections from placenta as positive control and 
cerebral cortex tissue as negative control. Anti-PD-1 (Freeman GJ and col.) was obtained from CoStim Pharmaceuticals and 
validated as described in Fig. S1 of Giraldo et al., Clinical Cancer Research, 2015. Tonsil, placenta and cerebral cortex slides were 
obtained from Geneticist Inc. 


= 
fev) 
a 
iS 
= 
a) 
= 
a) 
Wn 
a) 
je) 
= 
(a 
=F 
= 
io) 
12) 
2) 
a 
=} 
© 
Za) 
S 
3 
je’) 
= 
=< 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics All available characteristics are reported in Extended Data Table 1. 
Recruitment Patients were recruited prior to the study and were not selected on specific criteria other than their pathology. 


Ethics oversight The research was approved by the Research Ethics Committee of NTUH (201605061RINA). 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Checkpoint blockade therapies that reactivate tumour-associated T cells can induce 
durable tumour control and result in the long-term survival of patients with advanced 
cancers’. Current predictive biomarkers for therapy response include high levels of 
intratumour immunological activity, a high tumour mutational burden and specific 
characteristics of the gut microbiota*. Although the role of T cells in antitumour 
responses has thoroughly been studied, other immune cells remain insufficiently 
explored. Here we use clinical samples of metastatic melanomas to investigate the 
role of B cells in antitumour responses, and find that the co-occurrence of tumour- 
associated CD8°T cells and CD20" B cells is associated with improved survival, 
independently of other clinical variables. Immunofluorescence staining of CXCRS and 
CXCL13 in combination with CD20 reveals the formation of tertiary lymphoid 
structures in these CD8*CD20* tumours. We derived a gene signature associated with 
tertiary lymphoid structures, which predicted clinical outcomes in cohorts of 
patients treated with immune checkpoint blockade. Furthermore, B-cell-rich tumours 
were accompanied by increased levels of TCF7* naive and/or memory T cells. This was 


corroborated by digital spatial-profiling data, in which T cells in tumours without 
tertiary lymphoid structures had a dysfunctional molecular phenotype. Our results 
indicate that tertiary lymphoid structures have a key role in the immune 
microenvironment in melanoma, by conferring distinct T cell phenotypes. 
Therapeutic strategies to induce the formation of tertiary lymphoid structures 
should be explored to improve responses to cancer immunotherapy. 


In addition to T cells, the main component of the adaptive immune 
system consists of B cells. B cells localized in so-called tertiary lymphoid 
structures (TLSs)—which have been identified in several types of cancer, 
including melanoma* °—may improve antigen presentation, increase 
cytokine-mediated signalling, release tumour-specific antibodies, are 
associated with improved prognosis’ and, to some extent, with clinical 
responses to CTLA4*. Additional evidence on the importance of TLSs 
in the tumour immune microenvironment is provided in the accom- 
panying Articles®’. In our analysis of the immune microenvironment 
of melanoma tumours, we found infiltration of CD8* T cells in 33% of 
cases: 25% of the tumours had CD8*' T cells localized in clusters, and 
42% were devoid of CD8* T cells (Extended Data Table 1). By contrast, 
we found CD20" B cell clusters in 25% of the cases and such clusters con- 
sisted of both Ki67* and Ki67 B cells (Fig. 1a), which suggests that some 
Bcells are activated and proliferating’®. Notably, CD20*B cell clusters 


were in all cases surrounded mainly by CD4'T cells, which indicates 
formation of TLSs (Extended Data Fig. 1a). We then analysed whether 
these CD20*B cell clusters have similarities to bona fide TLSs. Known 
molecular markers of TLS formation include increased expression of 
CXCL13, CXCR5 and DC-LAMP". These markers were all upregulated 
in transcriptomic data from matched tumour tissue (Fig. 1b). Moreo- 
ver, immunofluorescence staining of two known TLS markers (CXCR5 
and CXCL13), in combination with CD20, supported the notion that 
these CD20*B cell clusters have molecular properties that have been 
described as necessary for TLS formation” (Fig. 1c). By contrast, CD8* 
T cells were localized mainly outside of such TLSs, but the presence 
of TLSs was in all cases coupled with tumour-associated CD8* T cells 
(Fig. la, Extended Data Table 1). The formation of TLSs may indicate 
that tumour antigens are recognized by the immune system. The inabil- 
ity of the immune system to completely eradicate the tumour would 
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Fig. 1| Identification of CD20‘ B cell clusters in melanomatumours. 

a, Representative immunostaining of CD20 (B cells), Ki67 (proliferating cells), 
SOX10 (melanoma cells), CD3 (T cells) and CD8 (T cells). In total, 177 melanoma 
specimens—including 113 lymph node metastases, 35 subcutaneous 
metastases, 10 visceral metastases and 15 primary tumours—were analysed. 
Sections were taken consecutively to spatially analyse the different 
immunostainings. Scale bars, 100 pm (patient 1), 200 pm (patient 2). HE, 
haematoxylin and eosin stain. b, Gene-expression heat map of known TLS 
marker genes. The gene-expression data were obtained from matched tumour 
tissue (n=160), as was used for the immunostaining. c, Representative 
immunofluorescence staining of CD20 (green) in combination with CXCR5 
(red) or CXCL13 (red) ina melanoma tumour knownto have TLSs, selected from 
the immunostaining cohort ina. Arrows indicate a CXCL13° cell cluster. 


subsequently lead to chronic inflammation, which is characterized 
by infiltrating immune cells and generation of TLSs”. Importantly, 
because all tumours with TLSs had tumour-associated CD8*' T cells 
(Fig. 1a, Extended Data Table 1), we hypothesized that TLSs may sup- 
port the activation of CD8* T cell attack against tumour cells. Indeed, 
survival analysis revealed that the presence of tumour-associated 
CD8* T cells or TLSs was associated with improved patient outcome 
in uni- and multivariate analyses (Fig. 1d, e, Extended Data Table 2). 
The combination of both TLSs and CD8° T cells was associated with the 
best survival outcome, CD8* T cells alone was linked with intermediate 
survival, and the absence of both TLSs and CD8* T cells was associated 
with the worst survival outcome (Fig. 1f). The survival association of 
the TLS/CD8* group was sustained in multivariate analysis adjusting 
for disease stage, metastasis localization, age and gender (P= 0.006, 
multivariate Cox regression model) (Extended Data Fig. 1b, Extended 
Data Table 2). Transcriptomic data showed additional differences in 
immunological gene signatures” (Extended Data Fig. 1c). Although 
TLSs were not restricted to lymph node metastases, these metastases 
represented the most-prevalent sample site containing TLSs (Extended 
Data Fig. 1d, Extended Data Table 1). To further understand the role of 
TLSs in tumours, we determined the spatial location of TLSs (on the 
tumour border or infiltrating), the number of TLSs per square mil- 
limetre and the presence of germinal-centre-like structures within 
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d-f, Kaplan-Meier survival analysis of the cohort stratified by CD8 (d), CD20 (e) 
and combining these two markers (f);n=165,n =164 andn=164 patients with 
available follow-up information ind, e and f, respectively. Cox regression 
analysis was used to calculate Pvalues. Numbers below plots represent 
numbers of patients. g, TLSs were evaluated for the level of maturation using 
Ki67 immunostaining and spatial location. Mature germinal-centre (GC)-like 
structures were detected exclusively in TLSs located inlymph node 
metastases, and there was no difference in spatial location between lymph 
node metastases and others. h, A representative case with multiple TLSs 
(numbered 1-4). TLS 2 and TLS 3 showa germinal-centre-like structure within 
the TLS, whereas TLS Land TLS 4 lack these structures. The representative case 
was selected fromn=18 investigated cases with multiple TLSs. 


TLSs using Ki67 immunostaining (Fig. 1g, h). The location of the TLS 
was independent of metastatic site and, notably, tumours with infiltra- 
tive TLSs had a significantly higher frequency of melanomas witha 
tumour-infiltrative CD8* T cell pattern (P= 0.009, Fisher’s exact test). 
Using survival analysis of patients with regional lymph node metasta- 
ses, we found atrend for patients with tumour-infiltrative TLSs having 
improved survival (Extended Data Fig. le). In total, 44% of cases with 
TLS had multiple TLSs per square millimetre, and these were found 
only in lymph node metastases (Extended Data Fig. If). Moreover, we 
found nine cases in which canonical germinal-centre-like structures 
were present within TLSs (Fig. 1g). Importantly, we found cases in which 
TLSs containing germinal-centre-like structures coexisted with loose, 
non-germinal-centre-like TLSs in the same tumour (Fig. 1h). The pres- 
ence in the tumour of TLSs with germinal-centre-like structures was 
not associated with patient outcome or the CD8* T cell infiltration 
pattern. In all, these data support the notion that different types of 
TLSs exist in individual tumours and that this is independent of the 
spatial location of the TLS. To reveal the molecular properties of the 
different T cell, Bcelland tumour cell populations, we used the GeoMx 
digital spatial profiler (Nanostring) to perform high-plex proteomic 
analysis (Extended Data Table 3) with spatial resolution” (Extended 
Data Fig. 2a). GeoMx data from CD20* B cell populations localized in 
TLSs revealed two main groups, characterized by high or lowexpression 
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Fig. 2|Bcell heterogeneity and T cell phenotypes using high-plex proteomic 
and scRNA-seq data. a, Unsupervised hierarchical clustering of B cell 
populations (n=30) from TLSs across 17 melanoma tumours. Two groups, 
which were independent of tumour core and patient, were clearly discerned on 
the basis of Ki67 expression. Proteins were filtered on the basis of sCRNA-seq 
data’®. Proteins the genes for which were expressed inB cells were included, 
and those genes not expressed in single B cells were excluded. b, T cell 
populations (n=22) in or in close proximity to Ki67"®" or Ki67"™ B cell 
populations, respectively, were analysed for differences. Box plots of CD4 and 
BCL-2 showincreased expression in T cells located in proximity to Ki67"=" 
Bcells. Pvalue from two-sided Wilcoxon rank-sum test. c, d, Differential 
analysis of T cell populations (n= 91) from 43 melanoma tumours. Proteins 


of Ki67 (Fig. 2a, Extended Data Fig. 2b). Indeed, highly proliferating 
B cells may operate in germinal centres: the Ki67"*" tumour-associated 
Bcells that were additionally characterized by increased CD40 expres- 
sion may therefore belong to more mature TLSs”. The data provide 
further support for the idea that TLSs at different stages exist in the 
same tumour (Fig. 2a). T cells found in, or in close proximity to, TLSs 
with Ki67"" B cells tended to havea higher proportion of CD4* cells and 
increased expression of BCL-2 (Fig. 2b). These T cells may therefore have 
undergone antigen activation that subsequently led to the upregula- 
tion of the pro-survival anti-apoptotic molecule BCL-2”. Collectively, 
these data support the hypothesis that these B cells and T cells belong 
to mature TLSs. To understand the effect of TLSs on the intratumoral 
T cell landscape, we analysed different properties of T cells obtained 
from within or in close proximity to TLSs, infiltrating T cells intumours 
with TLSs and T cells from tumours without TLSs. We found increased 
CD4 and decreased CD8 expression in T cells from within, or in close 
proximity to, TLSs (Fig. 2c, d). In addition, T cells in tumours without 
TLSs had increased expression of TIM3, PD1 and GZMB and decreased 
expression of BCL-2 (Fig. 2c, d). This is consistent with a recent study 
that demonstrates that T cells in patients who were not responding 
to immune checkpoint blockade (ICB) had a dysfunctional molecu- 
lar phenotype”. These findings also suggest that distinct patterns of 
intratumoral adaptive immune activation exist, and that these patterns 
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were filtered on the basis of a false-discovery rate (FDR) cut-off. FDR 
(Benjamini-Hochberg adjustment) from P values of Kruskal-Wallis test. Box 
plots of selected proteins with differential expression. e, scRNA-seq data of 
CD4* and CD8* T cellsin B-cell-rich and -poor tumours, respectively”. Heat map 
displays tumour means of 27 up- and downregulated genes, as ranked by FDR 
from atwo-sided f-test. B-cell-poor (n=16) and -rich (n=16) tumours are 
defined as those in the lower and upper tertiles, respectively, interms of the 
percentage of total cells that are B cells (<1% and >5.3%, respectively). The most 
significant and relevant genes are highlighted. In the box plots, the centre line 
is the median, the box limits are the lower and upper quartiles, and the whiskers 
extend to the most extreme values within 1.5 the interquartile range (IQR). 


may partly be driven by TLSs. We then investigated the expression of 
immune markers on captured tumour cell populations. The largest 
difference was found when comparing tumours without an immune 
cell presence to other tumours. As expected, the loss of antigen pres- 
entation—via B2M and HLA-DR and decreased PDL1 expression—was 
found in tumours without an immune cell presence (Extended Data 
Fig. 2c). However, there was no difference in PDL1 expression in tumour 
cells between tumours with TLSs and tumours with T cells alone. We 
further confirmed the loss of B2M protein using immunostaining, and 
found that protein loss was associated with increased frequency of DNA 
copy number loss at the B2M gene locus. Moreover, we confirmed the 
loss of MHC using the transcriptomic data. Notably, the inflammatory 
state and presence of TLSs was not associated with tumour mutational 
burden or any specific driver-gene mutation (Extended Data Fig. 2d-g). 

To gaina deeper molecular understanding of the tumour-associated 
B cells, we used single-cell RNA-sequencing (ScRNA-seq) data. After 
extracting all B cells from 27 melanoma tumours used in a previous 
study’® we then used gene sets to define activated, immature and mem- 
ory B cells”, as well as plasma cells!?”°. We found transcriptional evi- 
dence that a mixture of activated and immature B cells, and only asmall 
fraction of plasma cells, are present in melanoma tumours (Extended 
Data Fig. 3a), which provides further support for the presence of TLSs. 
A fraction of single B cells expressed the class-switching and affinity 


Nature | Vol577 | 23 January 2020 | 563 


Article 


maturation gene A/CDA or the master regulator of germinal-centre 
initiation, BCL6”. Moreover, genes important for germinal-centre ini- 
tiation (/RF4, POU2AF1, MEF2C, MYC, MEF2B, IRF8, BCL6, MCL1, TCF3, 
EBF1, SPIB, DOCK8 and BACH2), the germinal-centre light zone (CD83 
and CD86) the germinal-centre dark zone (CXCR4), and T cellinteraction 
(CD40) were abundantly expressed in B cells® (Extended Data Fig. 3b). 
Thus, the transcriptional data suggest a wide range of B-cell-derived, 
immature-to-mature germinal-centre signals. This is consistent with 
the heterogeneity of TLS states observed in the immunostaining and 
GeoMx data. MHC class | and II molecules displayed a uniform high 
expression across single B cells, which suggests that B cells within 
TLSs are generally capable of antigen presentation. The expression 
of /GLL1,a component of the B cell receptor in pre-B cells displayed 
an intriguing pattern. Three clear B cell groups could be discerned; 
plasma cells, cells positive for /GLLI and /GLLS and cells negative for 
IGLL1 and IGLLS. These groups could be further subdivided on the 
basis of CD69 expression (Extended Data Fig. 3b). Using previously 
published scRNA-seq data”, we found that the fraction of CD69* and 
IGLLS”- CD69* cells—and not IGLL5* B cells—was associated with the 
response to ICB (Extended Data Fig. 3c). Moreover, the CD69" B cell 
group we identified presents a more-pronounced germinal-centre- 
reaction phenotype than the IGLL1* IGLL5* B cell group, as CD69 is 
correlated with markers of the mature germinal centre such as CD83 
and CXCR4 (Extended Data Fig. 3d). Therefore, the observed B cell 
groups may reflect the maturation state of the underlying germinal- 
centre reaction that occurs in TLSs. By contrast, the percentage of 
IGHD*B cells (‘unswitched’ IgD*) and GHG’ B cells (‘switched’ IgG*) 
were not predictive of therapy outcome at baseline (Extended Data 
Fig. 3c). Collectively, these data support the presence of distinct sub- 
sets of B cells at different stages of B cell development, and their role 
inthe response to ICB; however, further studies are needed to confirm 
the role of CD69" B cells. Finally, we investigated whether the immune 
microenvironment of the tumour is adapted by the presence of B cells. 
In single-cell data, B-cell-rich samples contained more CD4* and CD8* 
T cells with naive and/or memory-like characteristics (expressing TCF7 
and /L7R) as compared to B-cell-poor samples (Fig. 2e), suggesting an 
influx of naive and memory T cells to TLSs. Such memory TCF7' T cells 
have previously been associated with an improved response to ICB”. 
This is consistent with our GeoMx data, in which T cells in tumours 
without TLSs had an exhausted-like molecular phenotype (Fig. 2c). 
Next, we used differential expression analysis to create a gene sig- 
nature that reflects melanoma tumours with TLSs (Fig. 3a, Extended 
Data Table 4). This signature included known B-cell-specific genes such 
as CD79B. Another interesting candidate is CCR6, which was recently 
found to be upregulated in activated B cells”. Indeed, in the single-cell 
data from melanomas’, CCR6 and CD79Bare specifically expressed in 
tumour-associated B cells. The remaining genes of the signature were 
expressed mainly by other types of immune cell (Extended Data Fig. 3e). 
Similarly, the TLS-hallmark genes CCR7, CXCRSand SELL (which encodes 
CD62L) were expressed in single B cells and—to some degree—by CD4* 
T cells, whereas CXCL13 is expressed predominantly by CD8* T cells 
(Extended Data Fig. 3f). This suggests that TLSs localized in melanoma 
tumours consist of B cells and other immune cells. Next, we constructed 
asignature from acompendium of TLS-hallmark genes (CCL19, CCL21, 
CXCL13, CCR7, CXCRS, SELL and LAMP3)", and found that it correlates 
closely with our TLS signature in three datasets” ~ (correlations of 0.91, 
0.85 and 0.87). Further, the TLS signature correlated strongly with B cell 
signatures and single B cell markers. The TLS signature also correlated 
with signatures of T cells and other types of immune cell**”*—although 
not to the same extent as it did to B cell signatures (Extended Data 
Fig. 4a). To gain further support for the TLS signature we derived, we 
retrieved RNA-seq data for metastatic melanomas from The Cancer 
Genome Atlas (TCGA) project”. Trichotomizing the data on the basis 
of our TLS signature confirmed the association with patient survival 
(Fig. 3b, Extended Data Table 2). Analysis of matched mutation data 
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Fig.3| TLS gene signature derived from the CD8*CD20°* group predicts 
prognosis and response to ICBin melanoma. a, Heat map of genes specifically 
upregulated in CD8*CD20* cases of melanoma. b, Kaplan-Meier analysis based 
onthe trichotomized TLS gene signature in the melanoma metastases cohort 
from the TCGA (n=349 patients with available follow-up information). Pvalue 
from Cox regression analysis. c, Mutational load across TCGA TLS groupings inb. 
Pvalue from Kruskal-Wallis test. d, Kaplan-Meier analysis for overall survival in 
patients treated with anti-CTLA4 (n=37). Pvalue from Cox regression analysis. 
e, Mutational pattern in patients treated with anti-CTLA4. f, Kaplan-Meier 
analysis for overall survival in patients treated with anti-CTLA4 (n=40). 

Pvalue from Cox regression analysis. Data froma previous study” were used. 

g, Kaplan-Meier analysis for overall survival in patients treated with anti-PD1 
(n= 69). Pvalue from Cox regression analysis. Data froma previous study” were 
used. h, Kaplan-Meier analysis for overall survival in patients treated with anti- 
PD1(n=40).P value from Cox regression analysis. Data froma previous study”® 
were used. i, Mutational load across the TLS grouping, using datafroma 
previous publication”®. Pvalue from Kruskal-Wallis test. Inb, d, f-h, patients 
were trichotomized according to high, intermediate and low expression of the 
TLS signature score. In the box plots, centre line is the median, the box limits 
are the lower and upper quartiles, and the whiskers extend to the most extreme 
values within 1.5x IQR. Numbers below plots represent numbers of patients. 


revealed no difference in mutational burden (Fig. 3c). Notably, samples 
with a TLS"#" signature also included non-lymph-node metastases, 
and—when extended to primary tumours—a small portion of the pri- 
mary tumours also hada high TLS gene score (Extended Data Fig. 4b). 
Collectively, this confirms a prognostic role for TLS in melanoma. 
Given the success of ICB in treating melanoma, we investigated 
the importance of tumour-associated TLSs in response to therapy 
(Extended Data Table 5). First, we gathered a collection of melanoma 
tumour biopsies from patients who were receiving CTLA4 blockade. 
Trichotomizing gene-expression data onthe basis of the TLS signature 
revealed that TLS"®" tumours in particular were associated with signifi- 
cantly increased survival after CTLA4 blockade (Fig. 3d, Extended Data 
Fig. 5a, Extended Data Table 2). Mutation data in melanoma driver genes 
further supported the notion that the TLS signature is independent of 
tumour genetic mechanisms (Fig. 3e). We further verified the predictive 


effect of the TLS signature using previously published’ data from an 
additional cohort of 40 patients with melanoma who were receiving 
CTLA4 blockade (Fig. 3f). Previous studies have demonstrated tumour 
mutational burden as a predictive biomarker for response to ICB”®. 
However, in this cohort of patients treated with anti-CTLA4, the TLS 
signature is independent of mutational load (Extended Data Fig. 5b). 
Moreover, the TLS signature was significantly associated with overall 
survival in a previously published” dataset of pretreatment samples 
from 69 patients who were undergoing anti-PD1 monotherapy or anti- 
CTLA4 and anti-PD1 combination therapy (Fig. 3g). We also observed 
the predictive effect of the TLS signature in a previously published”® 
dataset of pretreatment samples from 41 patients who were treated with 
anti-PD1 (of whom 50% had been exposed to anti-CTLA4 before anti-PD1 
treatment) (Fig. 3h, Extended Data Fig. 5c). Finally, we performed meta 
Cox regression analysis across the four cohorts treated with ICB, using 
multiple immune signatures: of these, our TLS signature performed 
best (Extended Data Fig. 5D). The TLS signature was also independent 
of tumour mutational load in the cohort treated with anti-PD1 (Fig. 3i), 
consistent with previous studies that have shown that immune gene 
signatures are not correlated with mutational load”. Although we did 
not observe significant differences in the TLS gene-expression score 
retrieved from pretreatment biospies with regards to ‘response evalu- 
ation criteria in solid tumours’ (RECIST), we observed a notable differ- 
ence in RNA-seq data from on-treatment biopsies that were collected on 
cycle 1 at day 29, which was confirmed in previously published cases of 
patients treated with anti-PD1°° (Extended Data Fig. 5e, f). This indicates 
that TLS functionality is induced by ICB treatment in patients witha 
clinical response. To further determine the biological relevance of our 
TLS signature, we applied it to RNA-seq data from 13 additional samples 
of melanoma that were obtained from patients who were receiving ICB, 
and performed concurrent immunostaining of CD20 and CD3. The 
samples with the highest TLS gene score contained TLSs (as detected 
by CD20 immunostaining), which confirms the ability of our gene sig- 
nature to predict samples with TLSs (Extended Data Fig. 5g). 

In conclusion, our data provide evidence that TLSs may have a key 
role in sustaining an immune-responsive microenvironment. This find- 
ing opens avenues for therapeutic strategies that aim at enhancing 
TLS formation and function, which could result in improved clinical 
outcomes and responses to cancer immunotherapy. 


Online content 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-019-1914-8. 


1. Robert, C. et al. Pembrolizumab versus ipilimumab in advanced melanoma. N. Engl. J. 
Med. 372, 2521-2532 (2015). 

2. Gopalakrishnan, V. et al. Gut microbiome modulates response to anti-PD-1 
immunotherapy in melanoma patients. Science 359, 97-103 (2018). 


3. Cristescu, R. et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based 
immunotherapy. Science 362, eaar3593 (2018). 

4. Ladanyi, A. et al. Prognostic impact of B-cell density in cutaneous melanoma. Cancer 
Immunol. Immunother. 60, 1729-1738 (2011). 

5. Messina, J. L. et al. 12-Chemokine gene signature identifies lymph node-like structures in 
melanoma: potential for patient selection for immunotherapy? Sci. Rep. 2, 765 (2012). 

6. — Cipponi, A. et al. Neogenesis of lymphoid structures and antibody responses occur in 

human melanoma metastases. Cancer Res. 72, 3997-4007 (2012). 

7. Sautés-Fridman, C., Petitprez, F., Calderaro, J. & Fridman, W. H. Tertiary lymphoid 

structures in the era of cancer immunotherapy. Nat. Rev. Cancer 19, 307-325 (2019). 

8. Petitprez, F. A. d. R et al. B cells are associated with survival and immunotherapy response 

in sarcoma. Nature https://doi.org/10.1038/s41586-019-1906-8 (2020). 

9. Helmink, B. A. et al. B cells and tertiary lymphoid structures promote immunotherapy 

response. Nature https://doi.org/10.1038/s41586-019-1922-8 (2020). 

10. Mihm,M.C., Jr & Mule, J. J. Reflections on the histopathology of tumor-infiltrating 

lymphocytes in melanoma and the host immune response. Cancer Immunol. Res. 3, 

827-835 (2015). 

11. Dieu-Nosjean, M. C., Goc, J., Giraldo, N. A., Sautés-Fridman, C. & Fridman, W. H. Tertiary 

lymphoid structures in cancer and beyond. Trends Immunol. 35, 571-580 (2014). 

12. Germain, C., Gnjatic, S. & Dieu-Nosjean, M. C. Tertiary lymphoid structure-associated B 
cells are key players in anti-tumor immunity. Front. Immunol. 6, 67 (2015). 

13. Bindea, G. et al. Spatiotemporal dynamics of intratumoral immune cells reveal the 
immune landscape in human cancer. Immunity 39, 782-795 (2013). 

14. Amaria, R. N. et al. Neoadjuvant immune checkpoint blockade in high-risk resectable 

melanoma. Nat. Med. 24, 1649-1654 (2018). 

5. De Silva, N. S. & Klein, U. Dynamics of B cells in germinal centres. Nat. Rev. Immunol. 15, 
137-148 (2015). 

6. Rogers, P.R., Song, J., Gramaglia, I., Killeen, N. & Croft, M. OX40 promotes Bcl-xL and Bcl-2 
expression and is essential for long-term survival of CD4 T cells. Immunity 15, 445-455 
(2001). 

7. Sade-Feldman, M. et al. Defining T cell states associated with response to checkpoint 
immunotherapy in melanoma. Cell 175, 998-1013 (2018). 

8. Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to 
checkpoint blockade. Cell 175, 984-997 (2018). 

9. Angelova, M. et al. Characterization of the immunophenotypes and antigenomes of 
colorectal cancers reveals distinct tumor escape mechanisms and novel targets for 
immunotherapy. Genome Biol. 16, 64 (2015). 

20. Tarte, K., Zhan, F., De Vos, J., Klein, B. & Shaughnessy, J. Jr. Gene expression profiling of 
plasma cells and plasmablasts: toward a better understanding of the late stages of B-cell 
differentiation. Blood 102, 592-600 (2003). 

21. Suan, D. et al. CCR6 defines memory B cell precursors in mouse and human germinal 
centers, revealing light-zone location and predominant low antigen affinity. Immunity 47, 
1142-1153 (2017). 

22. Gide, T.N. et al. distinct immune cell populations define response to anti-PD-1 
monotherapy and anti-PD-1/anti-CTLA-4 combined therapy. Cancer Cell 35, 238-255 
(2019). 

23. Cirenajwis, H. et al. NF1-mutated melanoma tumors harbor distinct clinical and biological 
characteristics. Mol. Oncol. 11, 438-451 (2017). 

24. Cancer Genome Atlas Network. Genomic classification of cutaneous melanoma. Cell 161, 
1681-1696 (2015). 

25. Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and 
stromal cell populations using gene expression. Genome Biol. 17, 218 (2016). 

26. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single- 
cell RNA-seq. Science 352, 189-196 (2016). 

27. van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic 
melanoma. Science 350, 207-211 (2015). 

28. Riaz, N. et al. Tumor and microenvironment evolution during immunotherapy with 
nivolumab. Cell 171, 934-949 (2017). 

29. Lauss, M. et al. Mutational and putative neoantigen load predict clinical benefit of 
adoptive T cell therapy in melanoma. Nat. Commun. 8, 1738 (2017). 

30. Roh, W. et al. Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and 
PD-1 blockade reveals markers of response and resistance. Sci. Transl. Med. 9, eaah3560 
(2017). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Nature | Vol577 | 23 January 2020 | 565 


Article 


Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Patient material 

This study was approved by the Regional Ethics Committee at Lund 
University (Dnr.191/2007 and 101/2013). The sample cohort, represent- 
ing a population-based retrospective collection (n=177), was obtained 
at the Department of Surgery at Skane University Hospital. 

Overall, 104 patients had regional metastatic disease, 50 distant 
disease and 19 local disease. Four patients were of unknown stage. 
This isa historical cohort, collected between 2000 and 2012. As such, 
the cohort is suitable for prognostic studies. Asummary of the patient 
characteristics is provided in Extended Data Table 1. 

We also collected paraffin-embedded tumour tissue from 119 
patients; 37 of these patients had received anti-CTLA4 as first-line 
therapy. Tumour tissue was collected from these patients in Denmark, 
and available biopsies were obtained amaximum of six months before 
the start of therapy. This study was approved by the regional ethical 
committee (H-15010200). DNA and RNA were extracted using the Qia- 
gen FFPE AllPrep procedure, as previously described”. 

We retrieved frozen tumour tissue from 13 patients who were under- 
going anti-PD1 therapy at Skane University Hospital, under ethical 
approval Dnr. 101/2013. RNA-seq and data analysis was performed as 
previously described”. 


High-plex proteomic analysis 

We used the Nanostring GeoMx platform for high-plex proteomic analysis 
with spatial resolution, as previously described“. Two 5-1m tissue micro- 
array slides were used. Antibodies against CD3, CD20, DAPI, and PMEL 
and S100B were used for immunofluorescence, which was subsequently 
used for region of interest selection and UV masking. Digital counts from 
barcodes corresponding to protein probes (in total 60 immune-related 
proteins) were analysed as follows: raw counts were first normalized with 
internal spike-in controls (ERCC) to account for system variation. To control 
for nonspecific antibody binding, values were further normalized by a 
linear scaling factor to obtainIgG control counts of 1 for each region of inter- 
est. To reduce background noise, values below 3 were set to land the data 
were log,-transformed. Data are provided in Supplementary Information. 


Immunohistochemistry 

Tissue microarrays were constructed using, on average, three 1-mm 
cores per tumour inan attempt to obtain a representative picture of the 
tumour. Thetissue block was cutin4-~1m sections, and then dried at 60 °C 
for 1h. The paraffin-embedded sections were deparaffinized and pre- 
treated in the PT-Link (DAKO) with target retrieval solution buffer pH 9. 
The following steps (except for the primary antibody staining) were per- 
formed inthe DAKO staining equipment (Autostainer plus) with DAKO 
kit K8010 solutions: peroxidase block (5 min), EnVision HRP-conjugated 
polymers (30 min), DAB substrate-chromogen solution (2 x 5 min) 
and counterstaining with haematoxylin (4 min). Between each step, 
the sections were rinsed with washing buffer. Finally, the sections 
were dehydrated and mounted with PERTEX mounting medium 
(ref. 00811) (Histolab). The primary antibodies used were all from Agi- 
lent/DAKO: CD3 (A0452) in1:200 dilution, CD8 (M7103) in1:100 dilution, 
MITF (Clone CS), B2M (A0072), Ki67 (MIB-1) in1:500 dilution and CD20 
(M0755) in1:400 dilution. SOX10 was performed in the clinical routine 
laboratory of clinical pathology (Skane University Hospital) using the 
mouse monoclonal IgG1 (clone BC34, Biocare Medical) antibody. 


Immunofluorescence staining 
Initially, the cells from snap-frozen tumours known to have TLSs were 
incubated in ice-cold acetone for 10 min and washed in PBS. All the 


following steps were performed in a humidified chamber. Unspecific 
binding sites were masked with PBS + 3% BSA for 90 min at room tem- 
perature. Mouse-anti-CD20 (1:200, 00064779, DAKO), rabbit-anti- 
CXCRS (1:200, 3180237-9, Abcam) and rabbit-anti-CXCL13 (1:200, 
NBP2-1604155, Novus Biologicals) were applied overnight at 4 °C. 
Donkey-anti-mouse-AF488 and goat-anti-rabbit-AF546 was applied 
1:1,000 in PBS + 1% BSA for 90 min at room temperature, followed by 
mounting with DAPI-containing mounting medium (Vector Labora- 
tories). Fluorescence images were acquired with an Olympus BX63 
microscope, DP80 camera and cellSens Dimension v.1.12 software 
(Olympus). 


Bioinformatic and statistical analyses 

Datasets. Microarray expression data were generated using the Il- 
lumina HT12 arrays, and have been used in a previous publication”; 
they are deposited in Gene Expression Omnibus, accession number 
GSE65904. Mutation data were generated using a sequencing panel tar- 
geting 1,550 cancer genes, as previously described”’, and copy number 
data were derived from the corresponding raw sequencing data using 
Contra version 2.0.3” with segmentation using GLAD®. 

RNA-seq data of metastatic melanomas from TCGA (level 3, release 
3.1.14.0) were downloaded from the data portal, quantile-normalized 
and log-transformed as log,(data + 1). 

The PD1-treatment RNA-seq data from ref. 7? were downloaded 
as fastq files from the European Nucleotide Archive (PRJEB23709) 
and fragments per kilobase of transcript per million mapped reads 
(FPKM) values were retreived using HISAT and Stringtie**. The data were 
reduced to protein-coding genes, samples were quantile-normalized 
and the data were log-transformed as log, (data + 1). Previously pub- 
lished PD1 inhibitor-treatment RNA-seq data”* were downloaded as 
count data (‘CountData.BMSO038.txt’) with annotations from https:// 
github.com/riazn/bms038_analysis/tree/master/data. The data were 
reduced to protein-coding genes and normalized for transcript lengths 
using exon annotations from the R package TxDb.Hsapiens.UCSC. 
hg19.knownGene, subsequently transformed to transcripts per mil- 
lion (TPM) and quantile-normalized. The data were log-transformed 
as log,(data + 2) —1. Previously published NanoString gene-expression 
data*° were downloaded from the respective supplementary table. Pre- 
viously published CTLA4 inhibitor-treatment data” were received from 
the authors as reads per kilobase of transcript per million mapped reads 
(RPKM) values; the data were quantile-normalized and log-transformed 
as log,(data + 1). sCRNA-seq data were retrieved from Gene Expres- 
sion Omnibus accessions GSE115978 and GSE120575, protein-coding 
genes were kept and cells with less than 1,700 or 1,000 genes expressed 
>0 were removed, respectively. Data for B cells were extracted, and 
quantile-normalized. For GSE115978, the available B cell definition 
was used; for GSE120575, no B cell definition was available and B cells 
were defined as CD19 > 2. 

We generated gene-expression profiles from 119 formalin-fixed paraf- 
fin-embedded (FFPE) samples using Affymetrix Clariom D microarrays. 
The hybridized FFPE material constituted three separate retrospective 
studies, including the 37 pre-ipilimumab treatment samples analysed 
inthis study. Principal component (PC) analysis informed us that this 
FFPE-derived data was greatly affected by sample degradation. We 
therefore reduced the data to probesets mapping to the 3’ untranslated 
region (UTR) of curated RefSeq transcripts; using PRINCIPAL categories 
from APPRIS*, we obtained 33,111 probesets in the 3’ UTRs of the prin- 
cipal gene isoforms. We further selected the two cohorts fromthe same 
Danish site, and removed one sample with a sample median expres- 
sion<Oand 7 samples with a median control exon (‘HTA2-pos’ probes) 
minus median control intron (‘HTA2-neg’ probes) expression < 1. 
The remaining probesets were filtered for being expressed, by keeping 
probesets that were above the median control intron expression in at 
least 90% of samples (19,990 probesets). The most-varying probeset 
for each protein-coding gene was kept (10,197 genes), and quantile 


normalization was applied. As the data was still affected by degrada- 
tion, PCland PC2 of the data were removed using R package swamp”, 
an offset of 1.5 was added to revert negative values, and the 5,000 genes 
with the largest variation were kept. Gene-expression data of the 37 
samples of cutaneous melanoma with ipilimumab pretreatment were 
extracted for this study (Supplementary Information). Additional data 
and codes are available from the corresponding author upon request. 


TLS signature. To derive the TLS-signature genes, we performed SAM 
analysis” to identify genes overexpressed in CD8*CD20* versus CD8* 
groups and subtracted the genes overexpressed in CD8* versus double- 
negative groups (Extended Data Table 4). For each dataset, the signa- 
ture genes that were present were extracted. Failed genes were defined 
as having an average Pearson correlation <0.15 to the other signature 
genes, and were excluded. The signature score was calculated as the 
mean gene expression. For survival analyses, the signature score was 
divided into equally sized tertiles. 


Statistical analyses. Fisher’s exact test was used for comparison of 
categorical variables. Pearson correlation was used for comparison of 
numerical variables. The t-test or Wilcoxon test and analysis of variance 
(ANOVA) were used for group comparisons of two or more than two 
groups, respectively. Owing to outliers, we used the Kruskal-Wallis test 
for the association of mutational load with the immunohistochemical 
groups. For univariate and multivariate survival analyses, we used Cox 
regression from the survival package. All bioinformatical analyses were 
done inR. All tests were two-sided. All box plots are depicted with the 
centre line representing the median, the box limits representing the 
lower and upper quartiles, and the whiskers extending to the most 
extreme values within 1.5x IQR. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All relevant data are available and are included as Source Data. Digital 
spatial-profiling data used in Fig. 2 and gene-expression microarray data 
from Danish patients treated with anti-CTLA4 are available as Source 
Data. Data from public repositories were accessed from GSE65904 


(ref. 72), TCGA data portal SKCM level 3 release 3.1.14.0, PRJEB23709 
(ref.”*), https://github.com/riazn/bms038_analysis/tree/master/data, 
GSE115978 (ref. 8) and GSE120575 (ref. ”). Any other relevant data and 
code can be obtained from the corresponding authors upon reason- 
able request. 
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Extended Data Fig. 1| Characterization of TLSsin melanomatumours. 

a, CD20 (B cells), CD3 (T cells), CD8 (CD8* T cells) and CD4 (CD4* T cells) 
immunostaining ina representative melanoma witha TLS (n=44 cases with 
TLS inthe cohort of 177 cases). b, Subset survival analysis using CD8 and CD20 
immunostaining in distant and lymph node metastases separately. n=27 and 
97 patients with available follow-up information, respectively. Pvalues from 
Cox regression. c, Gene-expression characterization of the three groups using 
previously described signatures”. aDCs, activated dendritic cells; BVs, blood 
vessels; DCs, dendritic cells; IDCs, immature dendritic cells; LVs, lymph vessels; 
NK, NK cells; Tem, T effector memory cells; Tfh, T follicular helper cells; Tfh. 
Th2,T follicular helper 2 cells; Th, T helper cell; Thi, T helper 1 cell; Th2, T helper 
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2cell.d, CD20, CD3, CD8, Ki67 and SOX10 immunostainings in three distant 
metastases. Arrows indicate the TLS. e, Survival analysis of 33 patients with 
TLS-containing tumours from regional lymph node metastases, stratified 
according to whether the TLS is located at the tumour border or is tumour- 
infiltrative. Pvalue from Cox regression. f, Bar plot showing quantification of 
TLSsin tumours. Numbers in the box corresponds to TLSs per square 
millimetre. g, TLS gene score and type of lesion. n=159 tumours.h, TLS score 
and immunological group. n=159 tumours. Inthe box plots, the centre line 
represents the median, the box limits represent the lower and upper quartiles, 
and the whiskers extend to the most extreme values within 1.5x IQR. Numbers 
below the graphs represent numbers of patients. 
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Extended Data Fig. 2| High-plex proteomic analysis using the GeoMx assay 
and genomic characterization of tumours containing TLSs. a, Workflow of 
the GeoMx assay. b, Immunofluorescence imaging of TLSs in tumour samples 
used in the GeoMx analysis. TLSs are sorted according to the unsupervised 
clustering of the high-plex proteomic data, performed onthe different B cell 
populations. Pink, CD3° T cells; green, tumour cells positive for PMEL and/or 
S100B; cyan, CD20* B cells. For Ki67"2" 13 of 13 TLSs are displayed, and for 
Ki67"" 15 of 17 TLSs are displayed. c, GeoMx data from 83 captured tumour cell 
regions. FDRs are from Kruskal-Wallis test, adjusted for multiple testing using 
the Benjamini-Hochberg method. d, Left, B2Mimmunostaining shows a 


significant difference between CD8/CD20 groups. P=1x 10™, Fisher’s exact 
test, n=172 tumours). Right, plot shows B2M copy number status (blue =loss). 
P=0.002, FDRadjustment for multiple comparisons = 0.007, Fisher’s exact 
test, n=127 tumours. e-g, MHC-I (e) and MHC-II (f) expression (n=160 
tumours, P value from ANOVA) and mutational load (g) (n=118 tumours, 
Kruskal-Wallis test) in relation to immunological groupings. h, Mutation heat 
map of melanoma-relevant genes in relation to immunological grouping. Inthe 
box plots, the centre line represents the median, the box limits represent the 
lower and upper quartiles, and the whiskers extend to the most extreme values 
within 1.5 IQR. 
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Extended Data Fig. 3 |scRNA-seq analysis of tumour-associated B cells. 

a, Box plots of gene-expression scores, based on different B cell developmental 
states in 812 B cells from 27 tumours froma previous study’®. b, Heat map of 
selected genes across all 812 B cells. /GLLSand CD69 were two of the five genes 
with highest expression variation across all B cells. The heat map is sorted on 
IGLLSand CD69 expression, excluding the cells that displayed increased 
expression of the plasma-cell signature. Genes showing a Pearson correlation 
>0.4 to/GLLS or CD69 expression are also indicated. SDC and PRDMI mark 
plasma cells, BCL6 and AJCDA mark germinal centres, HLA-DRA mark MHC-II 
and HLA-A, HLA-B and HLA-C mark MHC-I. TLS-hallmark genes, germinal- 
centre-related genes and other B cell genes are also indicated. c, Extracting the 
single B cell RNA-seq data froma previous study” using pretreatment samples 
(n=16). The fraction of CD69" B cells was higher in responders to ICB thanin 
nonresponders (n= 8), but the fraction of /GLL5* B cells was not. The fraction of 
IGLLS’ CD69 cells was also higher in responders. Plots of fraction of IGHD* and 


IGHG* B cellsin relation to response toICB therapy. P values from two-sided 
Wilcoxon test. Ina, the centre lines in the box plot represent the median, the 
box limits represent the lower and upper quartiles, and the whiskers extend to 
the most extreme values within 1.5x IQR. d, Pearson correlation between 
expression of CD69 and germinal centre genes (CD83 and CXCR4) in datafroma 
previous study’. Pie charts display the fact that the fraction of CD83* and 
CXCR4'B cells is increased among CD69" B cells. Expression >1was used asa 
cut-off for being present. Seven hundred and fifty-three B cells without a 
present plasma-cell signature were analysed. Pvalue from two-sided Fisher’s 
exact test. e, f, Heat map of gene-expression values corresponding to our TLS 
signature (e) and TLS-hallmark genes from the literature (f). Blue corresponds 
to increased expression. Mal., malignant cells. Ine, f, single cells from the seven 
cell types on the left are from ref."*, and from the four cell types onthe right are 
fromref.”. 
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Extended Data Fig. 4 | Comparison of the derived TLS gene signature to 
other immune signatures. a, Pearson correlation plots of the data fromthe 
cohort obtained at Skane University Hospital, Lund (top, n=160), data from 
cases of melanoma metastasis in the TCGA (bottom, n=363) and baseline data 
froma previous publication” (right, n= 69). Black box indicates the TLS 
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signature. All signatures are taken from refs. "’”°. Red, positive correlation; 
blue, negative correlation. b, TLS gene-signature scores in primary tumoursin 
comparison to distant and lymph node metastases. The number of tumours 
assigned to the TLS"®" category is indicated above the plot. 
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Extended Data Fig. 5| TLS gene signature in cohorts treated byICB. 

a, Progression-free survival (PFS) and TLS gene signature in the Danish cohort 
of patients treated with anti-CTLA4. Pvalue from Cox regression. b, TLS gene 
signature in relation to tumour mutational load, in data froma previous 
publication”’ (n=40 melanoma tumours). Pvalue from Kruskal-Wallis test. 

c, Survival analyses on data froma previous study’®, stratified according to 
whether patients are naive to anti-CTLA4 treatment or have progressed on 
anti-CTLA4. P values from Cox regression. d, Meta Cox regression analysis 
across the four cohorts treated using ICB (n= 186). Pvalues from Cox 
regression adjusted for study. e, TLS gene signature of pretreatment (n= 16) 


within 1.5 IQR. 


and on-treatment samples (n=10) in relation to therapy response in datafroma 
previous publication®°. Pvalue from two-sided ¢-test. f, TLS gene signature of 
pretreatment (n=38) and on-treatment (n=39) samples in relation to RECIST 
response in data froma previous study”®. Pvalue from ANOVA test. g, TLS gene- 
signature score in13 melanoma tumours that were also stained for CD20 
protein. As an example, the tumour with the third highest score had TLSs. The 
two top tumours also had TLSs, whereas the other tumours did not. In the box 
plotsinb,d,e, centre lines represent the median, the box limits represent the 
lower and upper quartiles, and the whiskers extend to the most extreme values 


Extended Data Table 1| Clinical features of the 177-patient cohort, shown in correlation with CD8/CD20 immunological 
grouping 


P VALUE 
ENTIRE CD8‘/CD20* (N=44) —- CD8*/CD20° (N=57) — CD8/CD20° (N=74) 
COHORT 
(N=177)* 
PATIENT CHARACTERISTICS 
0.08 
GENDER N (%) 
MALE 101 (57) 30 (70) 27 (47) 43 (60) 
FEMALE 72 (41) 13 (30) 30 (53) 29 (40) 
NA 4(2) 
AGE AT DIAGNOSIS | 65 (22-91) 66 (22-85) 64.5 (30-88) 65 (25-91) 
MEDIAN (RANGE) 
TUMOR CHARACTERISTICS | 
STAGE 0.003 
Ht 19 : 8 11 
mI 104 35 29 39 
IV 50 7 20 23 
NA 4 2 é 1 
METASTASIS TYPE 0.003 
LYMPH NODE 113 38 33 41 
SUBCUTANEOUS 35 3 12 20 
VISCERAL 10 1 5 4 
PRIMARY TUMOR 15 : 7 8 
NA 4 2 2 1 
CD8 IHC 
INFILTRATIVE N (%) 58 (33) 28 (64) 30 (53) : 
CLUSTERED N (%) 43 (24) 16 (36) 27 (47) : 
ABSENT N (%) 74 (42) : 74 (100) 
NAN (%) 2(1) 
CD3 IHC 
INFILTRATIVE N (%) 59 (33) 27 (61) 30 (53) 2@) 
CLUSTERED N (%) 52 (29) 17 (39) 27 (47) 8 (11) 
ABSENT N (%) 64 (36) 2 : 64 (86) 
NAN (%) 2(1) 
CD20 IHC 
44 (25) 44 (100) 0 (0) 0 (0) 
PRESENT N (%) 
131(74) 0 (0) 57 (44) 74 (56) 
ABSENT N (%) 
2(1) 
NAN (%) 
PRIMARY TUMOR CHARACTERISTIC | 
HISTOLOGICAL SUBTYPE 0.34 
26 (15) 7 (16) 7 (12) 12 (16) 
UNKNOWN PRIMARY N 
(%) 
36 (20) 10 (23) 13 (23) 13 (18) 
SSM 
57 (33) 15 (34) 21 (37) 21 (28) 
NM 
17 (10) 2(4) 3 (5) 12 (16) 
OTHER 
39 (22) 10 (23) 13 (23) 16 (22) 
NA 


P values from Fisher’s exact test, not adjusted for multiple testing. 
*CD8 and CD20 status was missing for two patients. 
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Extended Data Table 2 | Univariate and multivariate Cox regression model analysis of immunological groupings in 


melanoma 
Univariate analysis Multivariate analysis* 
HR (CI) P-value HR (CI) P-value 
CD8 
infiltrative 5 
clustered .21 (0.65-2.26) 0.54 
absent | 2.03 (1.20-3.43) 0.009 
CD20 
present - 
absent 2.18 (1.24-3.82) 0.007 
CD8/CD20 
CD8+CD20+ - 1 7 
CD8+CD20- .76 (0.93-3.35) 0.08 1.75 (0.91-3.37) 0.09 
CD8-CD20- | 2.54 (1.40-4.61) 0.002 2.60 (1.42-4.77) 0.002 
Stage* 
II - 1 - 
Il .89 (0.58-6.07) 0.28 2.38 (0.73-7.73) 0.14 
IV | 7.84 (2.38-25.73) 0.0007 9.39 (2.84-31.04) 0.0002 
Multivariate IHC marker and Stage 
CD8 CD20 CD8/CD20 Stage Metastasis type Age Gender 
Model 1 | 0.009 4x10° 
Model2 0.004 1x10* 
Model3 0.005 9x10° 
Model4 0.006 0.02 
Model5 0.005 0.03 0.70 0.58 
Model6* 0.006 8x107 0.68 0.69 0.76 
TCGA multivariate cox analysis 
TLS signature Source site** Age Gender 
Metastases only | 0.03 0.02 0.01 0.56 
Alltumors | 0.01 0.05 0.02 0.53 
including primary 
tumors 
Hazard Ratios in TCGA And ICB Cohorts 
HR (95% CI) p-value 
TCGA Metastases 
Tistie 1 2 
TIsittermediote 1.68 (1.10-2.56) 0.02 
Tis’ 1.81 (1.20-2.74) 0.005 
Danish Data 
Tishish 1 
Tsintermediate 3.16 (1.23-11.45) 0.02 
Tis" 2.36 (0.76-7.29) 0.14 
Van Allen Data 
Tiste* 1 
Tins 3.19 (1.09-9.35) 0.03 
Tis’ 3.50 (1.17-10.51) 0.03 
Gide Data 
Tistish 1 S 
Tiss: 1.08 (0.33-3.53) 0.9 
Tis’ 4.01 (1.47-10.99) 0.007 
Riaz Data 
Tistish 1 = 
Tisintermediate 2.95 (0.92-9.43) 0.07 
Tis’ 4.72 (1.53-14.56) 0.007 
Hazard ratios and confidence intervals in the bottom panel correspond to the Kaplan-Meier plots in Fig. 3. 
*Model 6 includes CD8/CD20 groups, stage, type of metastasis, age and gender as covariates. 


**Source site corresponds to primary tumour, regional lymph node, regional other and distant metastasis. 


Extended Data Table 3 | Immune-related proteins 
investigated in the GeoMx analysis 
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Extended Data Table 4 | SAM analysis results to obtain the nine-gene TLS signature 


Genes discriminating tumors with CD8 T cells alone compared to immune poor melanomas. 


Genes discriminating tumors with 
TLS and CD8 T cells compared to 
melanomas with CD8 T cells alone 


Fold Fold Fold Fold Fold change 
Gene change Gene change Gene change Gene change Gene 
C4orf7 5.850525 HLA-DMB 2.1769047 MFNG 1.7937715 HLA-C 1.6125638 DNASE1L3 2.7302825 
CD8A 4.3367634 ARHGAP9 2.1595395 MS4A7 1.7918254 CTSC 1.6079848 CCL21 2.4948518 
CXCL9 4.282619 LAMP3 2.1572433 TYROBP 1.7895821 RBM47 1.6064883 PLAC8 2.2948039 
CD3D. 4.109594 CD3G 2.1555102 ARHGAP25 1.7882159 CFB 1.605289 CD79A 2.1628115 
NAPSA 3.9003344 Evi2B 2.151902 LAPTMS 1.7830758 Evi2A 1.6043215 CD79B* 2.1487825 
CXCL13 3.878391 IL18BP 2.1482697 LOC387841 1.7806474 RNASE6 1.6032722 MGC29506 2.1386049 
INDO 3.7693827 FCER1G 2.1476345 EBI3 1.779771 FPR3 1.5993513 CD48 2.1153479 
CCL5S 3.7572536 CAMK1G 2.1367993 P2RY8 1.7786007 ABI3 1.5974672 CD52 2.1007886 
LTB 3.6854622 TBC1D10C 2.131732 C2 1.77573 RGS18 1.5973897 EIF1AY* 1.953128 
CCL21 3.6670513 AIM2 2.129987 NCF1 1.774397 LY86 1.5972974 LTB 1.9445939 
CCL19 3.634043 CD53 2.111374 PTPN22 1.7705708 PARP9 1.5966645 LRMP. 1.9264982 
IDO1 3.5465145 LCP1 2.1084743 FUCA1 1.7698385 OASL 1.5946562 FCRL3 1.9218179 
GZMK 3.4980235 HLA-F 2.1038625 GIMAP6 1.76921 SERPINA1 1.590805 IGJ 1.907474 
CXCL10 3.3749766 FGD3 2.0983796 BIK 1.7660133 SLCO2B1 1.5882876 CCL19 1.9025567 
GZMA 3.32443 HLA-DRB4. 2.096038 OAS2 1.7645061 SUSD3 1.5878158 BIRC3 1.8866026 
LOC652775 3.3141112 BIRC3 2.0951495 OLR1 1.7627058 UBE2L6 1.586905 PTPRCAP 1.8761556 
UBD 3.2973506 PYHIN1 2.0907834 GBP2 1.7621691 PSME2 1.5857576 NAPSB 1.8707682 
CD69 3.232438 PRF1 2.0893168 TLR8 1.7585747 PLCL2 1.5817555 CD37 1.8535264 
NKG7 3.2006474 FAM113B 2.0810633 EBI2 1.7581428 SAMD9L. 1.5809402 CCR7 1.8436617 
CD247 3.1996102 HCLS1 2.078721 RNF126P1 1.7579799 P2RY10 1.5796638 EAF2 1.8174217 
CD2 3.148034 GVIN1 2.074809 IFIH1 1.7515248 PIK3CG 1.5796467 FAM46C 1.8072395 
GBPS 3.1334765 TNFSF10 2.069712 CCR2 1.7498112 MXx1 1.5793523 CD27 1.7974521 
GZMB 3.0648828 HLA-DMA 2.0650768 AMICA1 1.7432792 ISG15 1.5779757 IRF8 1.792097 
CCR7 3.0349138 RASGRP1 2.0589375 AIF1 1.7415568 CDC42SE2 1.5738891 SELL 1.7876798 
CD79A 2.9795752 C1QC 2.056882 GLRX 1.7394656 P2RY13 1.5730076 PIM2 1.7812064 
CD48 2.9609842 HLA-DPB1 2.0534782 FAM46C 1.7381114 PARVG 1.5702442 CORO1A 1.7604752 
LOC647506 2.8764262 HLA-DOA 2.0447738 MARCO. 1.7333692 VAMP5S 15684881 HLA-DOB. 1.7463384 
HCP5 2.8615928 PSCDBP 2.0414143 CD86 1.7330478 FOLR2 1.5683817 CD3D 1.7436912 
SPOCK2 2.8324168 C7 2.0402412 STX11 1.7318233 HLA-DQB2 1.5620639 PTGDS* 1.7368562 
PTPRCAP 2.8146172 APOL3 2.0391428 APBB1IP 1.7280052 CD300LF 1.5616324 CLECL1 1.735072 
LOC649143 2.8042858 PTPN6 2.038657 LYN 1.7253813 VCAM1 1.5614156 LOC606724 1.7307659 
IL32 2.7964118 WAS 2.0366387 TNFAIPS 1.7246032 TNFAIP3 1.5603107 RBP5* 1.7290683 
LAG3 2.7963312 BCL11B 2.0366046 INPP5D 1.724358 SLAMF7, 1.5601757 UCP2 1.706331 
HLA-DQA1 2.7800684 C20orf100 2.0273168 CYSLTR1 1.7242785 HCK 1.5587567 TBC1D10C 1.7048037 
SLAMF6 2.7586188 CD37 2.0114977 MCOLN2 1.7232472 CARD11 1.558226 LIME 1.682545 
CD7 2.7440262 PRKCB 2.011462 SOD2 1.718578 CCL3 1.5579741 CD72 1.6630936 
LOC100133678 2.7211967 LOC400759 2.0097654 AKNA 1.7166778 PIK3IP1 1.5557806 CCR6* 1.6473467 
HLA-DRB6 2.713655 HLA-H 2.009027 TNFRSF1B 1.7134982 TSTD1 1.555676 TRAF3IP3 1.6443537 
CD27 2.7128735 FGD2 2.0087543 IGSF6 1.7131412 DDX60 1.553378 LGALS2 1.642509 
RARRES3 2.7052057 SLC40A1 2.0084155 CFD 1.7116679 IFIT2 1.553336 SKAP1* 1.6344548 
GBP4 2.6934311 HLA-DRB3 2.0073037 PLCG2 1.7082311 cD72 1.5531305 VNN2 1.6247652 
LOC651751 2.6899047 NCF1C 2.006084 ALOXS 1.7012193 SP140 1.5522988 PLCG2 1.6141123 
LOC652694 2.685632 CECR1 2.0038574 DHRS9 1.7011156 ZFP36 1.5516263 SLAMF6 1.6129341 
JSRP1 2.6221983 RASAL3 1.9993367 PPP1R16B 1.7008039 UBA7 1.5514268 PTPN6 1.605178 
GIMAP7 2.6219752 GIMAP5 1.9982749 LAX1 1.6922069 BATF 1.5485026 CD247 1.6045387 
IRF8 2.6158779 PIK3AP1 1.9907101 LST1 1.6902115 IFI27 1.547818 GAPT 1.6027378 
PRKCB1 2.5978892 ITK 1.9889216 ACSL5 1.68995, CSF2RA 1.5462152 LAT* 1.5986894 
LOC728835 2.573962 CD38 1.9883611 FCGR1B 1.6899358 NCF2 1.5452782 CD38 1.5831853 
IGLL3 2.560279 CD96 1.9737343 HLA-E 1.6876732 TAP2 1.5432647 CETP* 1.5777292 
LOC100133583 2.553927 PTPRC 1.9713553 PIM2 1.681412 DAPP1 1.5431949 PSCDBP 1.5753787 
CD52 2.5466175 CCL18 1.9712074 ARHGAP30. 1.680812 1FI35 1.5372068 ARHGAP9 1.5631298 
NAPSB 2.5303907 GIMAP2 1.9646053 HLA-G 1.6804048 ALOXS5AP. 1.5365016 CD1D* 1.5562539 
SH2D1A 2.5151858 CYBB 1.9622309 SLC7A7 1.6793996 ABCG1 1.5314059 LAX1 1.5507433 
LOC649923 2.5026162 CXCL11 1.9613086 PSCD4 1.6785691 LYL1 1.5313303 C7 1.5440828 
LOC652493 2.490692 PTGER4 1.9604341 CLEC4A 1.6785455 XAF1 1.5307403 CD6 1.5386331 
LOC647450 2.4846182 ITGAL 1.9591348 KYNU 1.678476 RGS10 1.5273899 CD3G 1.5267668 
CORO1A 2.4811893 MS4A6A 1.9552932 CTSH 1.6757919 FLI4 1.526178 DOCK8 1.5072867 
LOC606724 2.4712005 TAP1 1.9535233 ILIORA 1.6756753 IREZ 1.5240406 PVRIG 1.5064694 
HLA-DQB1 2.4548614 GIMAP1 1.943057 HAVCR2 1.6754664 IL4I1 1.5239596 
DNASE1L3 2.4548588 MGC29506 1.9380807 SEPP1 1.6752254 ADRB2 15232662 
GZMH 2.4348328 IGLL1 1.9257094 IL15 1.6723912 LAP3 1.5217831 
GBP1 2.4314487 DOCK2 1.9256791 PSTPIP2 1.6706566 APOBEC3G 1.5210005 
HCST 2.4196472 PLA2G7 1.9201738 UCP2 1.667601 TOX2 1.5196278 
PLEK 2.417525 CCL3L3 1.9036595 SIGLEC10 1.666853 LPXN 1.5194254 
FGL2 2.4134262 TRIM22 1.9034454 ZBP1 1.6624216 FBXO6 1.5154316 
CCL13 2.4028354 KLRD1 4.9017122 ITGB2 1.6614282 ASCL2 1.5137706 
IGJ 2.4006696 IFI44L 1.8944559 OAS1 1.6578766 AADACL1 1.5122902 
IRF1 2.3972383 C1QA 1.8862053 LOC647108 1.6575867 MYO1G 1.5105464 
LOC730415 2.377494 IKZF1 1.8854065 IFITM1 1.6569607 IFI6 1.5104185 
CCL4L2 2.3680255 ALDH2 1.8754903 CLECL1 1.65586 EAF2 1.5100641 
GNLY 2.3650563 DOCK8 1.8699276 ICOS 1.653902 LILRB3 1.5093257 
IENG 2.3623235 SEMA4D 1.8665683 RASSF5 1.6529869 CSFIR 1.5070226 
IL2RB 2.347795 NCKAP1L 1.859875 C1S 1.651071 CYTH4 1.5061525 
EPSTI1 2.335121 ANKRD22 1.8569533 MS4A4A 1.6509206 MGAT4A 1.505879 
CCL4L1 2.3269346 SLA 1.8484899 LAIR2 1.6478871 LAT2 1.5024157 
LYZ 2.3133543 CTSS 1.8483866 LILRBS 1.644692 SLC31A2 1.5022907 
LOC731682 2.3117633 WARS 1.8480946 LILRB2 1.6428468 ATP8B4 1.5022835 
HLA-DPA1 2.3102183 SASH3 1.8472831 PSMB8. 1.642432 RCSD1 1.5018578 
LOC642073 2.3056629 RGS1 1.8432442 CXCR3 1.6415471 GCH1 1.5016657 
HLA-B 2.303435 C1orf162 1.8429196 SPI1 1.6396817 LILRAS 1.5010871 
ADAMDEC1 2.2967832 CRKRS 1.8394536 CXCR4 1.6376555 
STATI 2.2745712 LRMP. 4.8371218 FCGR1A 1.6370385 
CD74 2.273994 SRGN 1.8356284 DEF6 1.6362371 
C1QB 2.2717638 CEBPA 1.8305081 ITGB7 1.6362158 
PVRIG 2.2529182 VNN2Z 1.8215137 NCF4 1.6340253 
CCL8 2.252525 GIMAP8 1.8203901 AGPAT9 1.6268378 
LOC642113 2.251248 CASP1 1.8195084 LGALS2 1.6255531 
FAM26F 2.2444625 DENND2D 1.8194357 CD163 1.6254405 
LIME1 2.2311883 GMFG 1.8193122 FGR 1.62524 
SELL 2.2240884 GPR65 1.8181477 ADD3 1.6250473 
PSMB9 2.2207227 MAP4K1 1.8174882 ANKRD29 1.6237315 
RAC2 2.2163737 ISG20 1.8124862 FCN1 1.623711 
IL7R 2.215939 FBP1 1.8124799 MIR15SHG 1.6234839 
FYB 2.2151234 CLEC12A 1.8113737 CD209 1.6229211 
LOC401845 2.2078538 CXCL12 1.8111467 LILRB4 1.6225767 
HLA-DRA 2.1994276 HLA-DOB 1.807083 IFIT3 1.6202953 
KLRB1 2.1980007 STAT4 1.8062018 KIR2DL4 1.6192334 
PLAC8 2.1869752 SLAMF8 1.8041736 TRAF3IP3 1.6186843 
GIMAP4 2.1838875 IFI30 1.7981355 GAPT. 1.6185887 
FCRL3 2.1787016 RSAD2 1.7952874 TLR7 1.6158199 
FAIM3. 2.1784108 PRKCH 1.7944446 STK17B 1.6147563 
CD6 2.1783032 SAMSN1 1.7943485 SLC15A3 1.6138071 


*Gene unique to the tumours with TLS. 


Extended Data Table 5 | Clinical features of the cohorts treated by ICB 


ENTIRE TLS#ISH (N=44) TLS'T (N=57) TLS!OW (N=74) 
COHORT 
(N=201) 
DANISH ANTI-CTLA4 TREATED COHORT 
COHORT N (%) 37 (18) 13 (35) 12 (32) 12 (32) 
AGE AT TREATMENT MEDIAN (RANGE) 63 (33-84) 65 (34-77) 64 (34-80) 59 (33-80) 
LDH LEVELS MEDIAN (RANGE) 207 (125-545) 211 (136-302) 184 (147-434) 205 (143-545) 
METASTATIC SITE 
SKIN ul 1 5 5 
LYMPH NODE 14 6 5 3 
CNS 3 1 0 2 
LUNG 3 1 1 1 
OTHER 3 1 1 1 
NA 3 : . . 
VAN ALLEN ANTI-CTLA4-TREATED COHORT (N=40)! 
AGE AT DIAGNOSIS (RANGE) 59 (32-83) 57 (33-83) 61 (43-77) 59 (32-71) 
GENDER (%) 
MALE 26 (65) 10 (38) 727) 9 (35) 
FEMALE 14 (35) 3 (21) 6 (43) 5 (36) 
RIAZ ANTI-PD1 TREATED COHORT (N=40)? 
PRIOR ANTI-CTLA4 TREATMENT (%) 
YES 19 (48) 8 (42) 737) 4(21) 
NO 21 (52) 5 (24) 6 (29) 10 (48) 
M STAGE 
MIA 10 4 4 2 
MIB 7 4 1 2 
MIC 16 4 5 7 
NA 6 - 3 3 
GIDE ANTI-PD1 TREATED COHORT (N=61)? 
GENDER N (%) 
MALE 45 (65) 14 (31) 16 (36) 15 (33) 
FEMALE 24 (35) 9 (38) 729) 8 (33) 
AGE MEDIAN (RANGE) 61 (24-90) 68 (24-78) 62 (40-82) 58 (42-90) 
METASTATIC SITE 
LYMPH NODE 19 9 6 4 
SUBCUTANEOUS 43 10 17 16 
LUNG 3 3 - : 
OTHER 4 1 3 3 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 
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n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


4) A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


CO) Uo 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection The following software was used in the study: 

R version 3.5.1 with base/standard packages and additional packages: estimate_1.0.13, GLAD_ 2.46.0, limma_3.38.3, survival_2.42-3, 
swamp_ 1.4.1 and TxDb.Hsapiens.UCSC.hg19.knownGene_ 3.2.2 . SAM analysis in the TMeV tool (v 4.8.1). HISAT (v. 2.1.0), Stringtie (v. 
.3.3b) (https://ccb.jnu.edu/software/stringtie/). samtools 1.9 (https://sourceforge.net/projects/samtools/files/samtools/). CONTRA 

2.0.3. (http://contra-cnv.sourceforge.net/) 


Data analysis The following software was used in the study: 

R version 3.5.1 with base/standard packages and additional packages: estimate_1.0.13, GLAD_ 2.46.0, limma_3.38.3, survival_2.42-3, 
swamp_ 1.4.1 and TxDb.Hsapiens.UCSC.hg19.knownGene_ 3.2.2 . SAM analysis in the TMeV tool (v 4.8.1). HISAT (v. 2.1.0), Stringtie (v. 
.3.3b 
) (https://ccb.jhu.edu/software/stringtie/). samtools 1.9 (https://sourceforge.net/projects/samtools/files/samtools/). CONTRA 2.0.3. 
(http://contra-cnv.sourceforge.net/) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


All relevant data are available and are included with the manuscript as source data. Digital spatial profiling data used in Fig. 2 and gene expression microarray data 
from anti-CTLA4 treated patients are available as source data. Data from public repositories were accessed from GSE65904 (Cirenajwis et al.), TCGA data portal 
SKCM level 3 release 3.1.14.0 (TCGA data), PRJEB23709 (Gide et al.), https://github.com/riazn/bms038_analysis/tree/master/data (Riaz et al.), GSE115978 (Jerby- 
Arnon et al.) and GSE120575 (Sade-Feldman et al.). Any other relevant data and code can be obtained from the corresponding authors upon reasonable request. 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


X] Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size These cohorts are unique and sample sizes are for the Lund cohort (n=177) and the Danish anti-CTLA4 (n=37). These are cohorts collected 
during a long time and we believe these are sufficient to askt the questions we put forward in the manuscript. In addition, we use cohorts 
where data are available in public repositories. 


Data exclusions | Patients were only excluded from analyses if the tissue specimen did not include any tumor cells as determined by a dermatopathologist 
Replication The derived TLS gene expression signature was firmly validated in several different datasets across multiple gene expression platforms. 
Randomization | Randomization is not relevant for this study as we are analysing tumor specific phenotypes. 


Blinding Immunostaining evaluation were done blinded by three independent readers of which one was a board-certified dermatopathologist. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used The following antibodies are used in this study. CD3 (polyconal, cat no A0452, lot no 20066809, Dako/Agilent), CD8 (clone 
C8/144B, cat no M7103, lot no 20029542, Dako/agilent), CD20 (clone L26, cat no. 760-2531, lot no 00064779, Dako/Agilent), 
SOX10 (clone BC34, cat no. ACI 3099 A, C, Biocare), B2M (polyclonal, cat no. A0072, lot no 00066626, Agilent/Dako), Ki67 (clone 
MIB-1, cat no. GA626, lot no 20027876, Dako/agilent) . For immunofluoroscence rabbit-anti-CXCR5 (cat no. 3180237-9, lot no 
GR3229212-1, abcam) and rabbit-anti-CXCL13 (cat no NBP2-1604155, lot no 0141712Da843058, Novus Biologicals) 


Validation Antibodies used in this study are well validated. This is clearly demonstrated by the manufacturer as well as the vast number of 
citations refereing to the antibodies. In addition, the SOX10 and CD20 stainings were performed at a Swedish clinically approved 
diagnostic laboratory. The following information is available for each antibody. 

CD3 (polyconal, cat no A0452) - Rabbit Anti-Human for IHC - In Western blotting, the antibody detects bands of the expected 
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molecular weights for CD3 antigens. The antibody recognizes CD3e in both a T-cell line (Jurkat) and a natural killer cell line 
(NK11), but does not react with lysates prepared from several B-cell lines (Raji, Ramos and JY), a myeloid cell line (U937) ora 
colon carcinoma cell line (Colo-205). 
In immunoprecipitation from Nonidet P40 lysates of surface-iodated T lymphoblasts, the antibody precipitates the g (26 kDa), d 
(21 kDa) and e (19 kDa) chain of the CD3 molecule, similar to the precipitation pattern seen using the well-characterized 
monoclonal mouse anti-human CD3, clone UCHT1. 

In ELISA, the antibody labels the CD3 peptide used as immunogen. 

CD8 (clone C8/144B, cat no M7103) - Mouse Anti-Human - SDS-PAGE analysis of immunoprecipitates formed between lysates of 
125I-labeled human T lymphoblasts and the antibody shows reaction primarily with a 32 kDa polypeptide corresponding to 
CD8a. 

CD20 (clone L26, cat no. 760-2531) - Mouse Anti-Human - The antibody was clustered as anti-CD20 at the Fifth International 
Workshop and Conference on Human Leucocyte Differentiation Antigens held in Boston 1993. SDS-PAGE analysis of 
immunoprecipitates formed between 125I-labeled tonsil cell lysate and the antibody shows reaction primarily with 30 kDa and 
33 kDa polypeptides. Studies using COS-1 cells transfected with cDNA encoding the CD20 molecule, indicate that the antibody 
labels an intracytoplasmic epitope localized on the CD20 molecule. 

SOX10 (clone BC34, cat no. ACI 3099 A, C, Biocare) - Mouse Anti-Human - Nuclear staining of SOX10 [BC34] was observed in 
96.4% (106/110) of cases of cutaneous melanoma and 83.9% (73/87) of cases of metastatic melanoma (Table 1). Staining of 
SOX10 [BC34] was also observed in spindle cell melanoma (100%, 19/19), desmoplastic melanoma (96.6%, 28/29), benign nevi 
(100%, 20/20) and schwannomas (100%, 28/28). SOX10 [BC34] nuclear staining was observed in the expected normal 
tissues: oligodendrocytes in cerebrum and cerebellum, myoepithelial cells in breast and salivary glands, melanocytes in skin, 
and Schwann cells in peripheral nerve. 

B2M (polyclonal, cat no. A0072) - rabbit anti-human - Crossed immunoelectrophoresis: B2M precipitation curve is visible in th 
eusage of 12.5 ul AO072 per cm2 gelsurface against concentrated urine from patients with tubular proteinuria. Using 2 ul human 
plasma no precipitation is observed. Staining: Coomassie Brilliant Blue. 

Ki67 (clone MIB-1) - Mouse Anti-Human - In Western blotting of lysates of the multiple myeloma cell line, IM-9, the MIB-1 
antibody labels bands of 345 and 395 kDa, identical to the bands labeled by the original Ki-67 antibody. Furthermore, Western 
blotting and competitive binding experiments clearly demonstrate that MIB-1, like the original Ki-67 antibody, reacts with an 
epitope encoded by a 66 bp repetitive element in the Ki-67 gene. In immunohistochemistry, the MIB-1 and the Ki-67 antibodies 
provide identical staining patterns on serial tonsillar frozen sections. The MIB-1 antibody recognizes native Ki-67 antigen and 
recombinant fragments of the Ki-67 molecule. 

IF-specific antibodies 

rabbit-anti-CXCR5 (cat no. 3180237-9) -rabbit anti-human - In Western Blot one specific band is observed in mouse B cells. 
Additional evidence comes from IF experiments on lymphoma cells. 

rabbit-anti-CXCL13 (cat no NBP2-1604155) - rabbit anti-human - In Western Blot one specific band is observed in 293T whole 
extract. Additional evidence comes from IF experiments on HepG2 cells. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics 


Recruitment 


Ethics oversight 


Overall, 104 patients had regional metastatic disease, 50 distant disease and 19 local disease. Four patients were of unknown 
stage. This is a historic cohort collected between 2000 and 2012. 57% were male patients, the age at diagnosis were on average 
65. 64% were lymph node metastases, 20% were subcutaneous metastases, 6% were visceral metastases and 8% were primary 
tumors. 


The Lund pre-checkpoint inhibitor cohort was collected prospectively from 1997-2012 at the Dept of Surgery, Skane University 
Hospital in Sweden. Only patients where sufficient tissue was available were included in the study. The Danish anti-CTLA4 
treated cohort is retrospective and includes all patients recieving first-line anti-CTLA4 treatment up until year 2016. 


This study was approved by the Regional Ethics Committee at Lund University (Dnr. 191/2007 and 101/2013). The sample 
cohort, representing a population-based retrospective collection (n=177), was obtained at the Department of Surgery at Skane 
University Hospital. We also collected paraffin embedded tumor tissue from 119 patients of which 37 patients received anti- 
CTLA4 as first-line therapy. These patients were collected in Denmark and available biopsy was obtained a maximum of six 
months before therapy start. This study was approved by the regional ethical committee (H-15010200). 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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There are amendments to this paper 


Epithelial-to-mesenchymal transitions (EMTs) are phenotypic plasticity processes 
that confer migratory and invasive properties to epithelial cells during development, 


wound-healing, fibrosis and cancer’ *. EMTs are driven by SNAIL, ZEB and TWIST 
transcription factors** together with microRNAs that balance this regulatory 
network’®, Transforming growth factor B (TGF-B) is a potent inducer of 
developmental and fibrogenic EMTs*”"°. Aberrant TGF-B signalling and EMT are 
implicated in the pathogenesis of renal fibrosis, alcoholic liver disease, non-alcoholic 
steatohepatitis, pulmonary fibrosis and cancer*”. TGF-B depends on RAS and 
mitogen-activated protein kinase (MAPK) pathway inputs for the induction of 
EMTs”. Here we show how these signals coordinately trigger EMTs and integrate 
them with broader pathophysiological processes. We identify RAS-responsive 
element binding protein 1 (RREB1), a RAS transcriptional effector?°”, as a key partner 
of TGF-B-activated SMAD transcription factors in EMT. MAPK-activated RREB1 
recruits TGF-B-activated SMAD factors to SNAIL. Context-dependent chromatin 
accessibility dictates the ability of RREB1 and SMAD to activate additional genes that 
determine the nature of the resulting EMT. In carcinoma cells, TGF-B-SMAD and 
RREBI directly drive expression of SNAIL and fibrogenic factors stimulating 
myofibroblasts, promoting intratumoral fibrosis and supporting tumour growth. In 
mouse epiblast progenitors, Nodal-SMAD and RREB1 combine to induce expression 
of SNAIL and mesendoderm-differentiation genes that drive gastrulation. Thus, 
RREBI provides a molecular link between RAS and TGF-B pathways for coordinated 
induction of developmental and fibrogenic EMTs. These insights increase our 
understanding of the regulation of epithelial plasticity and its pathophysiological 
consequences in development, fibrosis and cancer. 


Oncogenic mutations in KRAS are prevalent in pancreatic adenocarci- 
noma (PDA) and strongly potentiate the induction of EMT by TGF-B”. 
We transduced an inducible KRAS(G12D) oncogene into pancreatic 
epithelial organoids from Pdx1-cre;Cdkn2a";lox-stop-lox (LSL)-YFP 
(CIY) mice (Fig. 1a), and treated the organoids with either TGF- or 
SB505124” (SB), which blocks endogenous TGF-f signalling. Before 
induction of KRAS(G12D) expression, TGF-B caused a modest (fourfold) 
increase in Snail expression and did not alter organoid morphology 
or survival. When KRAS(G12D) was induced, TGF-B treatment caused a 
30-fold increase in Snail expression (Fig. 1b), followed by a decrease in 
E-cadherin, increase in ZEB1, organoid dissociation (Fig. 1c, Extended 


Data Fig. 1a) and apoptosis (Supplementary Video 1), all character- 
istic of alethal EMT”. Induction of Smad7 expression, a conserved 
TGF-B negative-feedback response, was independent of KRAS(G12D) 
(Fig. 1b). TGF-8 modulated the expression of 56 genes by more than 
fourfold and KRAS(G12D) augmented TGF-B induction of 13 of these 
genes (Extended Data Fig. 1b, c), including Snail and hyaluronan 
synthase 2 (Has2), known regulators of EMT”? (Extended Data Fig. 1d). 
We confirmed this response pattern in different pancreatic organoids 
and primary cultures (Extended Data Fig. le, f). These TGF-B responses 
required SMAD4, as shown in PDA cells with restored SMAD4 expres- 
sion (Extended Data Fig. 1g). 


‘Cancer Biology and Genetics Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA. Developmental Biology Program, Sloan Kettering Institute, 
Memorial Sloan Kettering Cancer Center, New York, NY, USA. “Wellcome Trust-Medical Research Council Centre for Stem Cell Research, University of Cambridge, Cambridge, UK. “Gerstner 
Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY, USA. °Microchemistry and Proteomics, Memorial Sloan Kettering Cancer 
Center, New York, NY, USA. °Present address: Tsinghua University School of Medicine, Department of Basic Sciences, Beijing, China. ‘Present address: Department of Histo-embryology, 
Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Shanghai Jiao Tong University School of Medicine, Shanghai, China. °Present address: Chemical 
Biology and Therapeutics Science program, Broad Institute, Cambridge, MA, USA. *e-mail: j-massague@ski.mskcc.org 
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Fig. 1| RREB1is a KRAS-dependent SMAD cofactor. a, Source and generation 
of CIY pancreatic epithelial organoids and SMAD4-restored PDA cells. b, Snail 
and Smad7 mRNA levels in pancreatic epithelial organoid cultures. Cells 
engineered to express doxycycline-inducible KRAS(G12D) treated with the 
TGF-B and Nodal receptor inhibitor SB505124 (SB, 2.5 uM) or TGF-B (10 pM) for 
1.5h. Dataare mean +s.d.;n=4; two-way analysis of variance (ANOVA), 
“*P<(0.0001.c, E-cadherin, ZEB1and DAPI immunofluorescenceinCIY 
pancreatic organoids with or without KRAS(G12D) expression, treated with 
SB505124 or TGF-f for 2.5 days. Scale bars, 30 um. Images are representative of 
two independent experiments. d, Screening of pancreatic progenitor 
transcription factor shRNA library for mediators of TGF-B-induced lethal EMT. 
Dot plot of shRNA enrichment in TGF-B-treated versus SB505124-treated 
SMAD4-restored PDA cells. Sox4” and Rrebl transcription factor genes score 


RREB1 asa RAS-regulated SMAD cofactor 


TGF-B binding to the receptor kinases TGFBR1 and TGFBR2 acti- 
vates SMAD2-SMAD3-SMAD4 (SMAD2/3/4) trimeric complexes, 
which target specific promoters and enhancers by interacting with 
context-determining transcription factors’. SMAD2/3 chromatin 
immunoprecipitation and DNA sequencing (ChIP-seq) in PDA cells 
treated with TGF-B revealed binding motifs for various RAS tran- 
scriptional effectors (FOS and JUN AP-1 components and ELK3 and 
the SMAD binding motifs CAGAC and 5GC” within SMAD2/3 peaks 
independently of KRAS(G12D) (Extended Data Fig. 1h-j). Notably, 
RREB1 motifs were specifically enriched within SMAD2/3 peaks 
in KRAS(G12D)-dependent TGF-f targets (Extended Data Fig. 1h). 
Although EMT is generally pro-tumorigenicin carcinoma cells, TGF-B 
triggers apoptosis in KRAS-mutant pancreatic progenitors owing to 
simultaneous induction of SNAIL and the pro-epithelial transcription 
factor SOX4”. We used this property of KRAS-mutant pancreatic pro- 
genitors to screenashRNaA library targeting 40 transcription factors 
expressed in PDA cells using shRNAs targeting the TGF-B receptors as 
positive controls (Fig. 1d). Rreb1 and Sox4 were the only transcription 
factor transcripts for which two independent shRNAs were enriched 
more than twofold (Fig. 1d). 

RREB1 contains 15 zinc fingers”, but little is known about its function 
and regulation®~’. In PDA cells in which SMAD4 expression has been 
restored and that also express haemagglutinin (HA)-tagged RREB1 
(residues 1-1291 mouse isoform) (Extended Data Fig. 2a), ligation assays 
showed close proximity between nuclear RREB1 and SMAD2/3 following 
TGF- treatment (Extended Data Fig. 2b, c). Co-immunoprecipitation 
revealed interactions between SMAD3 and HA-RREB1 (Extended Data 
Fig. 2d). The genome-binding pattern of HA-RREBI overlapped with 
that of SMAD2/3 in TGF-B-treated cells (Fig. le, f, Extended Data Fig. 2e), 


DEIDE2 PP1 Et 

positive in the screen. shRNAs targeting 7gfbr1 and Tgfbr2 are included as 
positive controls. e, Position of RREB1 peak summits relative to summits of 
overlapping SMAD2/3 peaks (left), and position of SMAD2/3 peak summits 
relative to summits of overlapping RREB1 peaks (right), based on ChIP-seq 
analysis (Extended Data Fig. 2e). f, Venn diagram showing overlap between 
SMAD2/3 and RREB1 ChIP-seq peaks, based on ChIP-seq analysis in Extended 
Data Fig. 2e. g, Gene track view of SMAD2/3 and HA-RREBI1 ChIP-seq tags at 
indicated loci and experimental conditions. Gene bodies are represented 
below the track sets. PP, proximal promoter; DE, downstream enhancer; UE, 
upstream enhancer. ChIP-seq was performed once and an independent ChIP 
was performed in which selective genomic regions were confirmed by 
quantitative PCR (qPCR). See also Extended Data Figs. 1-3 and Supplementary 
Videol. 


including in Snail and Has2 but not in Smad7 (Fig. 1g). HA-RREB1 bound 
to these loci in the absence of TGF-B signalling (Fig. le-g, Extended 
Data Fig. 2e). MAPK signalling has previously been implicated in RREB1 
regulation”®. Treatment of SMAD4-restored PDA cells with the ERK 
inhibitor SCH772984 (ERKi) or the MEK inhibitor AZD6244 (MEKi) 
did not alter nuclear localization (Extended Data Fig. 3a) or levels of 
RREB1 (Extended Data Fig. 3b, c), but diminished binding of HA-RREB1 
to Snail, Has2 and II11 cis-regulatory regions (Extended Data Fig. 3d). 
HA-RREB1immunoprecipitated from PDA cell lysates bound double- 
stranded DNA probes corresponding to Snail enhancer and Has2 
promoter regions; ERKi treatment decreased this activity (Extended 
Data Fig. 3e). We identified four ERK-dependent phosphorylation 
sites in HA-RREB1 immunoprecipitated from SMAD4-restored PDA 
cells (Extended Data Fig. 3f, g); all were situated between zinc-finger 
domains (Extended Data Fig. 3h). S161 and S970 fit the MAPK phospho- 
rylation motif PX(S/T)P, whereas $1138 and S175 may represent indirect 
phosphorylation by other kinases. RREB1 with S161 or S970 alanine 
substitutions was deficient in restoring Snail and Has2 TGF-B responses 
to Rreb1-knockout cells and in binding to these loci, compared with 
vectors encoding RREBI with phosphorylation-mimicking aspartate 
substitution (Extended Data Fig. 3i,j). 


RREB1 and TGF-B-dependent EMT 


Rreb1 knockout in SMAD4-restored PDA cells (Extended Data Fig. 4a-c) 
reduced TGF-B-dependent binding of SMAD2/3 to regulatory regions 
in Snail and Has2, and abolished their induction and EMT (Fig. 2a-c, 
Extended Data Fig. 4d, e). Rreb1 knockout had limited effects on 
the binding of SMAD2/3 to, and induction of, Smad7 (Fig. 2c, Extended 
Data Fig. 4f). Restoration of RREB1 rescued induction of Snail, Has2 
and //11 by TGF-B in Rreb1-knockout cell lines (Extended Data Fig. 4g). 
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Fig. 2 | RREB1 mediates KRAS and TGF-B dependent EMT. a, Gene track view of 
SMAD2/3 ChIP-seq tags at indicated loci of Rreb1 wild type (WT) and Rreb1- 
knockout (KO) SMAD4-restored mouse PDA cells. ChIP-seq performed once and 
confirmed for selected genomic regions by ChIP-PCR. b, ChIP-PCR analysis of 
SMAD2/3 binding to indicated sites of Snail (DE2) and Has2 (UE1) in WT and 
Rreb1-KO PDA cells after treatment with SB505124 (2.5 1M) or TGF-B (100 pM) for 
1.5h. Data are mean + s.e.m.;n =4; two-way ANOVA; ****P< 0.0001. c, Levels of 
Snail, Has2, Il11 and Smad7 in WT and Rreb1-KO PDA cells after treatment with 
SB505124 or TGF-B for 1.5 h. Data are mean + s.e.m.; n= 4; two-way ANOVA; 

****P< (0.0001. d, Volume of WT and Rreb1-KO SMAD4-restored PDA 
subcutaneous tumours in syngeneic mice. Data are mean +s.e.m.;n=10 
tumours, 5 mice per group; two-way ANOVA. e-g, Representative haematoxylin 
and eosin (H&E) staining (e), cleaved caspase-3 immunohistochemistry (f) and 
SNAIL immunohistochemistry (g) of subcutaneous tumours formed by WT and 
Rreb1-KO SMAD4-restored PDA cells 35 d after inoculation. Scale bars (e, f, top), 
50 um; (e, f, bottom), 10 tum; (g) 50 pm. In e-g, Images are representative of five 
biological replicates. h, Quantification of cleaved caspase-3-positive and SNAIL- 
positive cells in PDA tumour sections. n= 5 per group; two-tailed unpaired ¢-test; 
***P <0,0001. In violin plots, the middle line shows the median and dotted lines 
represent first and third quartiles. i, Left, subcutaneous tumours formed by WT 
or Rreb1-KO 393T3 lung adenocarcinoma cells in syngeneic B6129SF1/J mice 
excised 35 days after inoculation. Scale bar, 10 mm. Right, tumour growth 
monitored by firefly luciferase bioluminescence imaging (BLI) plotted over time. 
Data are mean +s.e.m.;=10 tumours, 5 mice per group; two-way ANOVA. j, 
Representative ex vivo lung bright-field and BLI images from mice inoculated 

21d after via tail-vein inoculation of WT or RrebI-KO 393T3 cells. Lung 
colonization load was quantified by BLI. Data are mean +s.e.m.; n= 6 mice per 
group; two-tailed unpaired t-test. See also Extended Data Figs. 4-6. 


The induction of lethal EMT by TGF-B in KRAS-mutant pancreatic 


progenitor cells is a barrier to PDA development”. SMAD4-restored 
PDA cells grew poorly as subcutaneous tumours in mice (Fig. 2d), were 


568 | Nature | Vol577 | 23 January 2020 


undifferentiated (Fig. 2e) and contained cells expressing apoptosis 
markers (Fig. 2f, h) and SNAIL (Fig. 2g, h). By contrast, Rreb1-knockout 
cells had higher tumorigenic activity (Fig. 2d), with well-differentiated 
epithelial histology (Fig. 2e) and few apoptotic (Fig. 2f, h) or SNAIL* 
cells (Fig. 2g, h). Notably, RREB1is downregulated in human PDA and 
mutated in approximately 5% of PDA cases”°. 

Activating KRAS mutations define a major subtype of human lung 
adenocarcinoma (LUAD). 393T3 cells derived from a Kras@”?;p53- 
mouse LUAD tumour”? showed ERK-dependent induction of Snail 
and Has2 by TGF-B, followed by EMT without apoptosis (Extended 
Data Fig. 5a—e). Rreb1 knockout inhibited the induction of EMT by 
TGF-B and acutely diminished growth of 393T3 cells as subcutaneous 
tumours and pulmonary metastatic colonies in mice (Fig. 2i, j, Extended 
Data Fig. 5f-j). In A549, aKRAS-mutant human LUAD cell line*’, RREB1 
knockout (Extended Data Fig. 6a) diminished SNA/1, SNAI2 (which 
encodes SLUG) and EMT responses to TGF-B, and inhibited tumour 
formation in mice (Extended Data Fig. 6b-d). Collectively, the results 
indicate that RREB1 mediates TGF-B-induced EMT in PDA and LUAD 
models independently of the tumorigenic phenotype associated 
with EMT. 


EMT-associated fibrogenic program 


The KRAS-dependent TGF-B response in pancreatic cancer progeni- 
tors showed enrichment for gene signatures of cell adhesion, migra- 
tion and EMT (Extended Data Fig. 6e). Notably, a majority of the 
13 KRAS-dependent genes induced by TGF-B were related to depo- 
sition of fibrous connective tissue (Extended Data Fig. 1d). Four of 
these genes encode inducers of extracellular matrix (ECM) produc- 
tion by mesenchymal cells in fibrosis, including interleukin 11 (IL-11) 
in cardiovascular and renal fibrosis”, connective tissue growth factor 
(CTGF, also known as CCN2) in glomerulonephritis”, WNT-inducible 
signalling pathway protein 1 (WISP1, also known as CCN4) inidiopathic 
pulmonary fibrosis*’, and platelet-derived growth factor B (PDGFB) 
in hepatic fibrosis**. The gene set additionally includes the ECM pro- 
teins laminin-a3 (Lama3), collagen 6a1 (Col6a1), collagen and calcium- 
binding EGF domain-containing protein 1 (Ccbe1), the ECM protease 
inhibitor serpin E1 (Serpinel) and Has2. 

Induction of //11, Wisp1, Serpine1, Pdgfb, Ccbe1, Has2 and Cola6é1 
by TGF-B in mouse PDA cells required RREBI (Fig. 3a, Extended Data 
Fig. 6f). RREB1 ChIP peaks overlapped with SMAD2/3 peaks in these 
genes (Fig. 3b). In PDA cells, TGF-B induced expression of Snail and 
Zeb1as previously described*® (Extended Data Fig. 6g), and depletion of 
SNAIL and ZEB] (Extended Data Fig. 6h, i) inhibited EMT but not fibro- 
genic gene responses (Extended Data Fig. 6j-m), showing that these 
gene responses are integral, but experimentally divisible, components 
ofacommon fibrogenic EMT program. Similar RREB1-dependent induc- 
tion of these fibrogenic genes and Snail by TGF-B occurred in 393T3 
and A549 LUAD cells (Fig. 3a, c). 393T3 pulmonary nodules showed 
marked presence of cancer-associated myofibroblasts and abundant 
collagen deposition, whereas time-matched, size-matched Rreb1- 
knockout 393T3 nodules did not (Fig. 3d, e). Thus, TGF-B-activated 
SMADs converge with RAS-activated RREB1 to drive fibrogenic EMTs 
in PDA and LUAD cells. 

Mammary ductal morphogenesis involves EMT*°. Mammary epi- 
thelial cells undergo EMT in response to TGF-B”; EMT induction by 
TGF-B in normal mouse mammary gland (NMuMG) cells?*”? requires 
ERK*? and RREBI (Extended Data Fig. 7a—c). RREB1 mediated SMAD2/3 
binding to the Snail locus and, to a lesser extent, the Has2 locus, and 
induction of these genes by TGF-B (Extended Data Fig. 7d-f). ERKi 
diminished binding of HA-RREB1 to regulatory regions of Snail and 
Has2in NMuMG cells (Extended Data Fig. 7g). The ERK-pathway activa- 
tor epidermal growth factor (EGF) increased—and ERKi suppressed— 
these gene responses, whereas an inhibitor of the EGF receptor had 
little effect on basal Snail and Has2 expression (Extended Data Fig. 7h), 
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Fig.3| RREB1 mediates a TGF-B fibrogenic response. a, Heat map of 
fibrogenic-gene responsesin WT and Rreb1-KO PDA cells after treatment with 
SB505124 or TGF-B (TB) for 1.5 h. n=2.b, Gene track view of SMAD2/3 and 
HA-RREBI1ChIP-seq tags at indicated loci and experimental conditions. Gene 
bodies represented at bottom of track sets. ChIP-seq was performed once and 
an independent ChIP was performed in which selective genomic regions were 
confirmed by qPCR. c, Heat map of fibrogenic genes in WT and RrebI-KO 393T3 
cells treated with SB505124 or TGF-B for1.5h.n=4.d, Representative a-smooth 
muscle actin (a-SMA) and PDGFRB immunohistochemistry, Masson’s 
trichrome stain and a-SMA and GFP immunofluorescence of colonized lung 
tissue after tail vein injection of WT or Rreb1-KO GFP* 393T3 cells. Scale bars, 
100 um. e, Quantification of staining in d.n for each group indicated in graph: 
two-tailed unpaired t-test; ****P< 0.0001, ***P< 0.001. Violin plots show all data 
points, midline represents the median and dotted lines show first and third 
quartiles. See also Extended Data Figs. 6, 7. 


indicating that RREB1 is required for TGF-B-induced EMT in normal 
mammary epithelial cells. 


Basis for contextual EMT programs 

Next, we investigated the role of RREB1 during gastrulation, whereby 
pluripotent epiblast cells undergo an EMT as they migrate and differen- 
tiate. Nodal, FGF and WNT signals drive mesendodermal differentiation 
and EMT in epiblast cells’**. In a spatially resolved RNA-sequencing 
(RNA-seq) dataset”, Rreb1 transcripts accumulated in the posterior 
primitive streak domain at mid-gastrulation (embryonic day (E)7.0) 
(Extended Data Fig. 8a), and overlapped with mesendoderm mark- 
ers Gsc and Brachyury (also known as 7) and EMT markers Snail and 
Cdh2 (N-cadherin) (Extended Data Fig. 8a). Mouse embryonic stem 
(ES) cells form embryoid bodies recapitulating signalling and lineage 
specification events of gastrulation”. Expression of the mesendoderm 


genes Fomes, Mixl1, T, goosecoid (also known as Gsc), Fgf8 and Wnt3 
gradually increased after two days of embryoid body differentiation, 
peaking on day 4 together with EMT drivers Snail, Twist1, Twist2 and 
Zeb2, and Cdh2 (Fig. 4a, Extended Data Fig. 8b). EMT, stem cell differ- 
entiation and gastrulation transcriptional signatures were enriched 
in parallel (Fig. 4b). Rreb1 knockout (Extended Data Fig. 8c) inhibited 
the expression of Snail and key mesendoderm genes (Extended Data 
Fig. 8d). Addition of activin A (ligand for Nodal receptors) to day 3 
embryoid bodies augmented the expression of mesendoderm and 
Snail genes in wild-type but not Rreb1-knockout embryoid bodies 
(Extended Data Fig. 8e). 

RNA-seq analysis of wild-type and Rreb1-knockout ES cells under 
pluripotency conditions (day O) and after four days of embryoid 
body differentiation (day 4) showed few differences between wild- 
type and Rreb1-knockout cells on day O, but lack of differentiation 
on day 4 (Fig. 4c), together with an absence of signatures of stem cell 
differentiation, EMT and gastrulation gene signatures (Extended Data 
Fig. 8f). Nodal and activin receptors signal through SMAD2/3’. SMAD2/3 
ChIP-seq peaks in day 3 embryoid bodies overlapped with HA-RREB1 
ChIP peaks genome-wide (Fig. 4d, Extended Data Fig. 9a-c), providing 
evidence for direct cooperation of SMADs and RREB1in mesendoderm 
differentiation and EMT. 

The assay for transposase-accessible chromatin using sequencing 
(ATAC-seq) revealed a shared major peak of chromatin accessibility on 
the Snail promoter® in embryoid bodies and PDA cells, which over- 
lapped with ChIP-seq SMAD2/3 and RREBI1 binding profiles (Fig. 4d). 
The ATAC-seq profile overlapped with the ChIP-seq profiles on differ- 
entiation genes in day 3 embryoid body, and with Wisp] and Serpinel 
in PDA cells (Fig. 4d, Extended Data Fig. 9c). ATAC-seq revealed low 
chromatin accessibility at Gsc and Mixl1 in PDA cells and at Wisp1 and 
Serpinelin embryoid bodies, suggesting that different chromatin acces- 
sibility patterns enable SMAD2/3 and RREB1 access to Snail and Has2, 
but with contextual restriction from fibrogenic and mesendoderm loci. 


RREB1 requirement during gastrulation 


To determine whether RREB1 regulates gastrulation in vivo, we assessed 
the development of chimeric embryos comprising Rreb1“ ES cells 
(Fig. 4e). Whereas Rreb1" chimaeras generally developed normally, the 
majority (approximately 75%) of Rreb1” ES-cell-containing embryos 
exhibited severe morphological abnormalities (Extended Data Fig. 10a, 
b). At E8.5, we observed aberrant development of neuroectoderm, 
comprising irregular neural plate folding (Fig. 4f, g) and dispropor- 
tionate and bilaterally asymmetric headfolds (Extended Data Fig. 10c), 
defective intersomitic boundaries (Fig. 4f), and ectopic somite-like 
structures (Extended Data Fig. 10d). Some mutant chimaeras were so 
defective that specific structures, including the primitive streak and 
anterior—posterior axis, could not be discerned (Fig. 4f). We also noted 
axis duplications, including duplications of the epiblast (Extended 
Data Fig. 10c); posterior derivatives, including the allantois (Fig. 4f); 
and anterior derivatives, including the headfolds (Extended 
Data Fig. 10c, e). 

AtE7.5, approximately 75% of mutant embryo chimaeras were devel- 
opmentally retarded or morphologically abnormal (Extended Data 
Fig. 10a—b, f). Similiar to wild-type embryos, chimaeras containing wild- 
type ES cells formed a primitive streak and expressed markers of 
differentiation and EMT (Fig. 4h-j, Extended Data Fig. 10f). Although 
mutant embryo chimaeras expressed T and SNAIL within the primitive 
streak and nascent mesoderm (in both wild-type and Rreb1 “ cells), they 
frequently showed an accumulation of cells in the posterior epiblast, 
resulting in bulges into the amniotic cavity and/or a folded epiblast 
layer containing multiple cavities (Fig. 4h, i, Extended Data Fig. 10f, g), 
defects characteristic of gastrulation failure. No difference was 
detected in the number of mitotic or apoptotic cells between wild- 
type and mutant embryo chimaeras (Extended Data Fig. 10h, i). 
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Fig. 4| RREB1 and SMAD regulate distinct context-dependent EMTs. a, Heat 
map of regulated transcripts during embryoid body differentiation. RNA-seq 
was performed at indicated times after shifting ES cells into differentiation 
medium (without leukaemia inhibitory factor (LIF)). EMT (red) and 
mesendoderm lineage genes (blue) are highlighted. n=2.b, Gene set 
enrichment analysis for the indicated signatures in day 4 embryoid bodies. 
NES, normalized enrichment score. c, Regulated transcripts (fold change >4 or 
<0.25) in WT and RrebI-KO cells, on day 4 relative to day 0 of differentiation. 
n=2.d, Gene track view of ATAC-seq, and SMAD2/3 and RREB1ChIP-seq tags at 
indicated loci, in day 3 embryoid bodies (red tracks) versus TGF-B-treated 
(1.5h) PDA cells (blue tracks). ATAC-seq and ChIP-seq were performed once 
and confirmed for selected regions by qPCR of selected genomic regions. 

e, Chimaeras generated by injecting WT Rreb1” or mutant Rreb1” mCherry- 
tagged ES cells into WT mouse blastocysts were transferred to pseudopregnant 
females and dissected at E7.5-E8.5.f, h, Bright-field images of WT and Rrebl”~ 
chimeric embryos at E8.5 (f) and E7.5 (h). RrebI chimaeras displayed 
morphological defects. Arrowheads; somites (f), abnormal accumulation of 


In wild-type embryos and chimaeras, there was a switch of E-cadherin 
to N-cadherin as cells ingressed through the primitive streak (Fig. 4j). 
In mutant embryo chimaeras, cells within the aberrant bulges or folds 
continued to express E-cadherin and either did not strongly upregulate 
N-cadherin (Fig. 4j) or co-expressed both cadherins, with some embryos 
exhibiting ectopic N-cadherin within the posterior epiblast (Extended 
Data Fig. 10j). Together, these data demonstrate that mutant cells do 
not undergoa proper EMT at the primitive streak, resulting in gastrula- 
tion defects. Notably, Rreb1-knockout cells did not exhibit an absolute 
EMT block. Considering EMT as a continuum of states*, several EMT 
states—including Nodal- and RREB1-dependent EMTs—may overlap 
temporally and spatially within the embryo***». 


Discussion 


The present work reveals how TGF-B and RAS-MAPK signals acting 
jointly through SMAD and RREBI transcription factors trigger EMTs in 
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cells within epiblast (h). g, i,j, Confocal images of whole-mount 
immunostained chimaeras. g, Maximum intensity projections of E8.5 
chimaeras showing abnormal neurectoderm and axis duplication in Rrebl” 
chimaera. i, Sagittal section showing ectopic Brachyury expression, extensive 
epiblast folding and multiple cavities in Rreb1” chimaera (right). Anterior- 
posterior orientation of the embryo was not possible.j, Sagittal sections of 
whole chimaeras and representative sections through primitive streak region. 
Arrowheads, abnormal epiblast folding. Yellow dashed lines, epiblast- 
mesoderm boundary. Brackets, primitive streak. HF, headfold; NT, neural tube; 
Al, allantois; Epi, epiblast; PS, primitive streak; ExM, extraembryonic 
mesoderm; meso, mesoderm; A, anterior; P, posterior; Pr, proximal; Ds, distal; 
L, left; R, right. Scale bars, 50 um. Images in f-j are representative of two 
independent experiments. k, Summary of RAS-dependent TGF-B or Nodal 
effects, coordinately triggered by cooperation between RREB1 and SMAD2/3 
to activate EMT and associated contextual programs in carcinoma progenitors 
and pluripotent embryonic cells. Main target genes in each program and 
context are indicated. See also Extended Data Figs. 8-10. 


different contexts (Fig. 4k). EMT and mesendoderm differentiation are 
entwined events during gastrulation’*“*, and our results shed light on 
how they are linked. SMADs, via RREBI1, directly regulate the expression 
of EMT transcription factors and mesendoderm genes in pluripotent 
progenitors, and of EMT transcription factors and fibrogenic factors 
in carcinoma cells. The induction of SNAIL and fibrogenic mediators 
are biologically coordinated but experimentally separable processes. 
This level of coordination is distinct from, and adds to the role of, SNAIL 
as inducer of downstream fibrogenic signals in renal fibrosis*”**, EMTs 
can couple to either morphogenic or fibrogenic events depending on 
context, and our evidence points at an epigenetic basis for this con- 
textual nature of EMTs. With 15 zinc fingers and large interdomain 
regions, RREBI1 probably coordinates interactions between DNA, 
SMAD proteins and other cofactors*”*?°°. RREB1 is an understudied 
RAS effector, the structural and functional properties and genetic 
alterations of which warrant further attention. The generality of the 
TGF-8-SMAD-RREBI mechanism as atrigger of diverse EMTs provides 


common ground for the analysis of EMTs in developmental and regen- 
erative processes and paves the way for a better understanding of the 
role of TGF-B in the pathogenesis of organ fibrosis and cancer. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| RREB1 as aSMAD cofactor in TGF-B gene responses. 

a, YFP fluorescence images of CIY organoids expressing KRAS(G12D) under 
doxycycline control treated with SB505124 or TGF-f for 2.5 days. Scale bars, 
200 um. Images are representative of two independent experiments. 

b, Influence of KRAS(G12D) on TGF-B gene responses. CIY pancreatic organoids 
inducibly expressing KRAS(G12D), were treated with SB505124 or TGF-B for 
1.5hand analysed by RNA-seq. Dots represent fold change (in log,) in mRNA 
levels of individual genes under TGF-B versus SB505124 treatment conditions, 
with KRAS(G12D) expression turned off (x axis) or on (yaxis). Off-diagonal dots 
correspond to TGF-B gene responses that were enabled (groups! and II) or 
disabled (groups III and IV) by KRAS(G12D). Gene activation (land III) and 
repression responses (II and IV) are included. c, Heat map of four classes of 
KRAS-modified TGF-B gene responses. n=1. Representative result of two 
independent experiments. Classes I-IV correspond to the off-diagonal genes 
derived fromthe RNA-seqinb. d, TGF-B gene activation responses augmented 
by KRAS(G12D) (class I responses) in CIY pancreatic organoids. Fold increase in 
mRNA levels in TGF-B versus SB505124 treatment conditions in presence or 
absence of inducible KRAS(G12D). e, Heat maps showing TGF-B induction of 
Snail, Has2, 1111, Smad7 and Skilin four independent CIY mouse pancreatic 
organoid lines with inducible KRAS(G12D) expression. n=4. f, Heat map of the 


indicated TGF-B gene responses in spheroid cultures of pancreatic epithelial 
cells (PECs) inducibly expressing KRAS(G12D). n=2.g, Heat map of the 
indicated TGF-B gene responses in monolayer cultures of mouse 
Kras@”?;Smad4".Cdkn2a"; Pdx1-cre (KSIC) PDA cell lines transduced witha 
SMAD4 vector or anempty vector. n=2.h, Transcription factor (TF)-binding 
motifs enriched in KRAS-independent SMAD2/3 binding sites (left) and KRAS- 
dependent SMAD2/3 binding sites (right). SMAD2/3 ChIP-seq analyses were 
performed in SMAD4-restored PDA cells that were treated with SB505124 

(2.5 1M) or TGF-B (100 pM) for 1.5h. Transcription factor binding-motif 
analyses were performed with PscanChIP. n= 821 peak regions (left). n=778 
peak regions (right). i, Motif enrichment analysis of RAS-regulated 
transcription factors in KRAS-dependent (n=778 peak regions) and KRAS- 
independent (n= 821 peak regions) SMAD2/3 binding sites. j, Comparative 
enrichment of classic SMAD binding motifs (CAGAC and GGCTG) and 5GC 
motifs (GGC(GC)|(CG)) ina 200-bp region of SMAD2/3 ChIP peaks within 1,000 
bp of atranscriptional start site**. The relative enrichment is normalized to the 
baseline dataset obtained from 20,000 random 200-bp regions from the 
mm10 genome assembly. The SGC motifs are enriched approximately fourfold 
in SMAD2/3 ChIP peaks compared to the baseline, and the classic motifs are 
enriched twofold. 
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Extended Data Fig. 2| RREB1 interacts with SMAD and binds to TGF-f target 
genes. a, Western blot analysis of RREB1and HA-RREB1 levels in SMAD4- 
restored PDA cells stably transduced witha HA-RREB1 vector. Tubulin 
immunoblotting was used as loading control. Data are representative of two 
independent experiments. b, Proximity ligation assay showing TGF-B- 
dependent proximity between RREB1, SMAD2/3 and SMAD4 inthe nucleus. 
Scale bars, 30 pm. Data are representative of two independent experiments. 

c, Quantification of PLA signals in b. Cell numbers (n) of each group are 
indicated in the graph, two-tailed unpaired t-test. Dataare meant+s.d. 

****P < 0.0001, ***P< 0.001. d, SMAD4-restored PDA cells expressing HA-RREB1 


-3k 0+3k -3k 0+3k -3k 0+3k -3k 0+3k -3k 0 +3k 


were treated with TGF-B for 1.5h, lysed and immunoprecipitated (IP) with the 
indicated antibodies. The immune complexes were collected and subjected to 
western blot with the antibodies indicated on the left. Data are representative 
of two independent experiments. e, Heat map of ChIP-seq tag densities for 
SMAD2/3 and HA-RREB1in genomic regions +3 kb from the centre of SMAD2/3 
binding peaks in SMAD4-restored PDA cells that were treated with SB505124 or 
TGF-B for 1.5 hand subjected to SMAD2/3 and HA-RREBI1 ChIP-seq analysis. 
ChIP-seq was performed once, and an independent ChIP was performed in 
which selective genomic regions were confirmed by qPCR. 
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Extended Data Fig. 3 | See next page for caption. 


Extended Data Fig. 3 | RREB1is phosphorylated and regulated by ERK.a, 
Representative immunofluorescence images of HA-RREB1in SMAD4-restored 
PDA cells treated with DMSO or 1M ERK inhibitor SCH772984 (ERKi) for 6h. 
Scale bar, 20 pm. Data are representative of two independent experiments. 
b,c, Western blot analysis of RREB1 (b) or HA-RREB1 levels (c) in SMAD4- 
restored PDA cells treated with DMSO, 11.M ERKi or 11M AZD6244 (MEKi; an 
inhibitor of the ERK-activating kinases MEK1/2) for the indicated time periods. 
Tubulin immunoblotting was used as loading control. Data are representative 
of two independent experiments. d, ChIP-PCR analysis of HA-RREB1 binding 
to the indicated sites (Figs. 1g, 3b) in Snail, Has2 and /l11 in SMAD4-restored 
PDA cells that were treated with vehicle (DMSO) or ERKi (11M) for 6h. 

Mean +s.e.m.n=3, two-way ANOVA. **P< 0.01, ***P< 0.001, ****P<0.0001.e, 
SMAD4-restored PDA cells expressing HA-RREBI1 were treated with ERKi for 
the indicated length of time. HA-REBB1 was tested for binding to Snail DE2 and 
Has2PP1 double-stranded DNA oligonucleotide probes in DNA affinity 


precipitation assays. Data are representative of two independent experiments. 


f, Schematic of RREB1. Each tick represents a previously annotated 


phosphorylation site in PhosphoSitePlus identified in at least two independent 
mass spectrometry experiments. Red filled circles represent high 
stoichiometry (>15%) phosphorylation sites that are inhibited by ERKi, as 
identified ing. Zinc-finger domains annotated in Uniprot are shown. 

g, Phosphorylation stoichiometry of four ERK-dependent RREB1 
phosphorylation sites in SMAD4-restored PDA cells, as determined by SILAC 
mass spectrometry of cells treated with DMSO (control) in light medium or 
ERKiin heavy medium for 6h. h, Summary of ERK-dependent RREB1 
phosphorylation sites, sequence motifs and phosphorylation stoichiometry. 
i, Rreb1-KO PDA cells were transduced with the indicated RREB1-WT or 
phosphorylation-site mutant constructs, then treated with SB505124 or TGF-B 
for 1.5h. mRNA levels of Snail and Has2 were determined by qPCR with reverse 
transcription. Mean+s.e.m.n=4, two-way ANOVA. ***P< 0.001, ****P< 0.0001. 
j, ChIP-PCR analysis of HA-RREBI1 binding to the indicated sites in Rreb1-KO 
PDA cells transduced with the indicated RREBI-WT or phosphorylation-site 
mutant constructs. Mean +s.e.m.n=4, two-tailed unpaired f-test. 

***P< 0.0001. 


Article 


a -Rreb1 Exon Intron " WT KOI Ko2 
S'UTR 123 4,5 6 7 8 9 10 3'UTR RREB1 | 
OT ais 
Ko2 = 3 TUbULin | ‘Sh see 
b Rreb1 5'UTR and CDS exon1 (chr13:37,893,648-37,894,060) 
sgRNA 
WT GTTGAACTAGGGCAAGGGTGTTGGGCATTTCTGTCTCTTTTACCATGGTAGCTATGATTAGCACCACATG 


ACTATGATAATCCTACACTTTCTCACAAGGACATCGCTGACTTCTCTTTTTCAGTTCTATAGCAGAGACTT 
CTTAGAAGCATAAAACCCTGTCCCGATGACGTCGAATTCGCCCATTGGTTTAGAAGGCT CAGACCTGTC 
TTCCATCAACACCATGATGTCAGCAGTAATGAGCGTAGCGAGTGTCACAGAGAATGGTGGGAGCCCCC 


AGGGCATCAAGTCCCCCATGAAACCTCCAGGACCAAATCGGATTGGCAGAAGGAACCAGGTGAGGGT 
GTGTGGCTTTGGCGGGAGCAGCTAATAGATTTCATCTTGGGAGAAT GTAGAGTAGAAGTTCCCACACT 


PDA KO1 403bp deletion 
Allele1/2 septs 


SCTAGAAGFTFGCGGACACT 
12bp insertion 

PDA KO2 GTTGAACTAGGGCAAGGGTGTTGGGCATTTCTGTCTCTTTTACCATGGFCTTTTACCATGGAGCTATGAT 

Allelet/2 TAGCACCACATGACTATGATAATCCTACACTTTCTATAATCCTACACTTTCTGAGAAGGAGATFGGGTGAGTT 


CTCAGAGAGAATFEGTICCGCAGCSCCCCG CAA CAAG CGCACGCAAA AGGA 
TTGGCAGAAGGAACCAGGTGAGGGTGTGTGGCTTTGGCGGGAGCAGCTAATAGATTTCATCTTGGGAG 
AATGTAGAGTAGAAGTTCCCACACT 224bp deletion 
Rreb1 CDS Exon7 (chr13:37,929,572-37,929,703) 

sgRNA 

WT AAACCTACGGCGGTGCATCAGCGAGCAGCACCGGTTTGTGTGTGACACCTGCGACAAGGCGTTCCC 

CATGCTGTCGTCACTCATCCTGCACAGGCAGAGCCACATCCCTGCCGATCAGGGACGGGAGAAGCT 


PDA KO1 102bp deletion 
Allele1/2 AAACCTACG A 


ACA ACA 


CGATCAGGGAC 


GGAGAAGCT 


PDA KO2 101bp inversion 


Allele1/2_ AAACCTACGGGCAGGGATGTGGCTCTGCCTGTGCAGGATGAGTGACGACAGCAT GGGGAACGCCTT 
GTCGCAGGTGTCACACACAAACCGGTGCTGCTCGCTGATGCACCGCGATCAGGGACGGGAGAAGCT 


d a SMAD4+ 
SMAD4- WT Rreb1-KO1 Rreb1-KO2 


SMAD4+ 


SMAD4- WT KO1  KO2 
TGR: - + - + - + - + 


E-cad | me ee —— ee 


TUDUIIN | —— a a SP wr Ga qo co. 
Oo 
= 
f Smad7 g 
75 5kb 7 Snait ee Has2 , 
ene @ 300 5 
RREB1| 98 | om ; 3 : 
WT 15 < 
zm 200 RREK RK 
rors[ ff iy 
a S 100 
RReEB1| 58 | = : 
KO1 15 Oo 
TGFB i xe 2 “ 
0 RREB1: -_+ -_+ -_+ - - +- + - +- +- + 


ieee WT KO1 KO2 WT KO1 KO2 WT KO1 KO2 


Extended Data Fig. 4 | See next page for caption. 


Extended Data Fig. 4 | RREB1 mediates KRAS-dependent TGF-B responses in 
PDA cells. a, Scheme of CRISPR-Cas9-mediated mutation of Rreb1 in mouse 
SMAD4-restored PDA cells. b, sgRNA sequences and genomic sequences of 
Rreb1 coding region (CDS) exons 1and 7 in mutant clones KO1and KO2 derived 
from SMAD4-restored PDA cells. c, Western blot analysis of RREB1 levels in WT 
and Rreb1-KO cells. Tubulin immunoblotting was used as loading control. Data 
are representative of two independent experiments. d, Western blot analysis of 
E-cadherin in mouse KSIC PDA cells, SMAD4-restored PDA cells and two Rreb1- 
KOSMAD4-restored PDA clones, treated with SB505124 or TGF-B for 24h. 
Tubulin immunoblotting was used as loading control. Data are representative 


of two independent experiments. e, Representative E-cadherin 
immunofluorescence and DAPI staining of the same cells as ind treated with 
SB505124 or TGF-B for 48 h. Scale bars, 100 tm. Data are representative of two 
independent experiments. f, Gene track view of SMAD2/3 ChIP-seq tags in the 
Smad7\ocus of the WT and RrebI-KO PDA cells. The gene body is schematically 
represented at the bottom. ChIP-seq was performed once and an independent 
ChIP was performed in which selective genomic regions were confirmed by 
qPCR. g, mRNA levels of Snail, Has2and /[11in WT and two Rreb1-KO cells that 
were transduced with an RREB1 vector or empty vector and then treated with 
SB505124 or TGF-B for1.5h. Mean+s.e.m.n=4; two-way ANOVA; ****P< 0.0001. 
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Extended Data Fig. 5| See next page for caption. 


Extended Data Fig. 5 | RREB1 mediates tumorigenic EMT inlung 
adenocarcinoma cells. a, Snail, Has2 and /[11 mRNA levels in393T3 mouse 
LUAD cells treated with DMSO (Ctrl) or ERKi (SCH772984, 11M) for 6h, 
followed by treatment of SB505124 or TGF-f for1.5h. Mean+s.e.m.n=4; 
two-way ANOVA; ****P< 0.0001. b, Cdh1 mRNA levels in 393T3 cells with the 
indicated treatments for 48h. Mean+s.d.n=4; two-way ANOVA. c, Western 


blot analysis of E-cadherin in 393T3 cells with the indicated treatments for 48 h. 


Tubulin immunoblotting was used as loading control. Data are representative 
of two independent experiments. d, SMAD4-restored PDA cells and 393T3 
LUAD cells cultured in D1OF containing 2.5 1M MK2206” were treated with 
SB505124 (2.5 uM) or TGF-B (100 pM) and assayed for cleaved caspase 3/7 
activity at the indicated times. Mean +s.e.m.n=4; two-way ANOVA; 
**P<0.001.e, SMAD4-restored PDA cells and 393T3 cells cultured in DIOF 


containing 2.5 uM MK2206 were treated with SB505124 or TGF-B. Cell viability 
was determined at the indicated times. Mean +s.e.m.n=4; two-way ANOVA; 
***P< 0.001. f, sgRNA sequence targeting Rreb1 CDS exon 3, and mutant Rreb1 
genomic sequences of the resulting 393T3 KOl and KO2 clones. g, mRNA levels 
of Snail and Has2in the WT and Rreb1-KO 393T3 cells after treatment with 
SB505124 (2.5 1M) or TGF-B (100 pM) for1.5h. Mean +s.e.m.n=4; two-way 
ANOVA; ****P< 0.0001. h, Phase contrast images of 393T3 cell monolayers 
treated with SB505124 or TGF- for 48 h. Scale bars, 200 pm. Dataare 
representative of two independent experiments. i, Weight and volume of 
tumours in Fig. 2i. Mean+s.e.m.n=10, two sites were inoculated per mouse; 
two-tailed unpaired t-test; ****P< 0.0001.j, Representative haematoxylin and 
eosin staining images of indicated lung tissue sections from Fig. 2j. Scale bars, 
200 um. Dataare representative of two independent experiments. 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6 | RREB1-dependent TGF-B responses in LUAD and PDA 
cells. a, sgRNA sequence targeting RREBI CDS exon3 and mutant RREB1 
genomic sequences of the resulting A549 KOland KO2 clones. b, SNAIL and 
SLUG mRNA levels in WT A549 and two RREBI KO clones treated with SB505124 
or TGF-B for 24h. Mean +s.e.m.n=4; two-way ANOVA; ****P< 0.0001. c, Phase- 
contrast images of WT A549 and RREBI-KO cell monolayers treated with 
SB505124 or TGF-B for 48 h. Scale bars, 200 ppm. Data are representative of two 
independent experiments. d, Growth kinetics of tumours formed by 
subcutaneously inoculated WT or RREBI-KO A549 cells in athymic mice, as 
determined by BLI of atransduced firefly luciferase gene in the cells. 

Mean +s.e.m.n=10, two sites were inoculated per mouse; two-way ANOVA. e, 
Gene ontology analysis of TGF-B response genes in CIY organoids inducibly 
expressing KRAS(G12D), based onthe RNA-seq in Extended Data Fig. 1b. f, WT 
and Rreb1-KO PDA cells were treated with SB505124 or TGF-B for 1.5 hand 


analysed by RNA-seq. Dots represent fold change (in log,) inmRNA levels of 
individual genes under TGF-B versus SB505124 treatment conditions, in Rrebl- 
KO (xaxis) or WT cells (y axis). Off-diagonal dots corresponding to Snail, Has2, 
Il11and Wisp1are highlighted. g, Induction of Snail and Zeb1 expression by 
TGF-B in mouse PDA cells. Mean+s.d.n=4.h, sgRNA sequence targeting Snail 
and resulting mutant Snail genomic sequences in mouse PDA cells. i, 
Knockdown of Zeb] with two independent shRNAs in SNAIL-KO mouse PDA 
cells (KOsh cells). Mean+s.d.n=4.j, Fibrogentic gene responses to TGF-B in WT 
and SNAIL and ZEB1-double depleted KOsh PDA cells. Mean+s.d.n=4.k-m, 
E-cadherin levels (k), phase-contrast images (I) and E-cadherin and Zeb1 
immunofluorescencein WT and KOsh PDA cells that were treated with 
SB505124 or TGF-B for 48 h. Scale bars, 100 pm. Data are representative of two 
independent experiments. 
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Extended Data Fig. 7 | RREB1-dependent TGF-f responses in mammary 
epithelial cells. a, sgRNA sequence targeting RREB1I CDS exon3 and mutant 
RREB1 genomic sequences of the resulting NUuMG KO1land KO2 clones. b, 
Phase-contrast images of WT and Rrebl-KO NMuMG cell monolayers treated 
with SB505124 or TGF-f for 48 h. Scale bar, 100 pm. Data are representative of 
two independent experiments. c, Western blot analysis of E-cadherinin WT and 
RrebI-KONMuMG cells, treated with SB505124 or TGF-f for 48 h. B-actin 
immunoblotting was used as loading control. Data are representative of two 
independent experiments. d, WT and Rreb1-KO NMuMG cells were treated with 
SB505124 or TGF-f for 1.5 hand analysed by RNA-seq. Dots represent fold 
change (inlog,) inmRNA levels of individual genes under TGF-B versus 
SB505124 treatment conditions, in Rreb1-KO (x axis) or WT cells (y axis). Off- 
diagonal dots corresponding to Snail and Has2are highlighted. e, ChIP-PCR 


analysis of SMAD2/3 binding to the Snail (DE2) and Has2 (UE1) regions (Fig. 1g) 
in WT and RrebI-KO NMuMG cells. Cells were treated with 2.5 p.MSB505125 or 
100 pM TGF-f for1.5h. Mean+s.e.m.n=4; two-way ANOVA; ***P< 0.001, 
*““*P<0.0001.f, mRNA levels of Snail and Has2in WT and Rreb1-KO NMuMG 
cells after treatment with SB505124 or TGF-f for1.5h. Mean+s.e.m.n=4; two- 
way ANOVA; ****P< 0.0001. g, ChIP-PCR analysis of HA-RREB1 binding to the 
indicated Snail and Has2 regions in NMuMG cells that were treated with vehicle 
(DMSO) or the ERKi SCH772984 (1M) for 6h. Mean +s.e.m. n=3; two-tailed 
unpaired t-test. h, Snail and Has2 mRNA levels in NMuMG cells treated with 
DMSO (Ctrl), ERKi (1 uM SCH772984), EGF (10 ng mI“, 10 min) or EGFR inhibitor 
(gefitinib, 11M, 2h), followed by SB505124 or TGF-B treatment for another 1.5h. 
Mean +s.e.m.n=4; two-way ANOVA; ***P< 0.001, ****P< 0.0001. 
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Extended Data Fig. 8 | RREB1in gastrulation EMT and mesendoderm 
differentiation. a, Corn plot presentation of Rrebl, Snail, Cdh2, Gscand Tin 
E7.0 mouse embryo. A, anterior; L, left; R, right; P, posterior regions. Each dot 
represents transcript level at a specific positional address. Heat map denotes 
expression level of each gene computed from transcript counts in RNA-seq 
datasets”. b, Reads per million reads (RPM) of Rreb1, Snail, Twist1, Cdh2, Fomes 
and Zeb2in the RNA-seq dataset at the indicated times after shifting ES cells 
into LIF-deficient embryoid body differentiation medium.c, sgRNA sequence 
targeting Rreb1 CDS exon 3, and mutant Rreb] genomic sequences of four 


WT Day4 KO Day4 WT Day4 


resulting mouse ES cell KO clones. d, mRNA levels of EMT (Snail and Cdh2) and 
mesendoderm differentiation genes (Fomes, Gsc, Tand Mixl1) in WT and four 
independent Rreb1-KO clones on day 4 embryoid body differentiation. 

Mean +s.d.n=4; two-way ANOVA; ****P< 0.0001. e, mRNA levels of the 
indicated genesin WT and four independent RrebI-KO clones treated with 
receptor inhibitor (SB505125) or activin A (AC) for 2h. Mean+s.e.m.n=4;two- 
way ANOVA; ****P< 0.0001. f, Gene set enrichment analysis for gastrulation, 
EMT and stem cell differentiation genes in WT cells, and absencein Rreb1-KO 
cells at day 4 embryoid body differentiation. 
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Extended Data Fig. 9 |RREB1and SMAD contextually regulate EMT genes. a, 
Heat map of ChIP-seq tag densities for SMAD2/3 and HA-RREB1 in genomic 
regions +3 kb from centre of 3,422 high-confidence SMAD2/3 binding peaks in 
day 3 embryoid bodies subjected to SMAD2/3 and HA ChIP-seq analyses. b, 
Gene track view of SMAD2/3 and HA-RREBI1 ChIP-seq tags in the loci of EMT 
genes (Has2, Twist1 and Zeb1) and early mesendoderm lineage genes (Eomes, T 


and Mix/1) in day 3 embryoid bodies. Gene bodies are schematically 
represented at the bottom of each track set. c, Gene track view of ATAC-seq and 
SMAD2/3 and RREB1 ChIP-seq tags on indicated loci, in day 3 embryoid bodies 
(red tracks) versus TGF-B treated (1.5 h) SMAD4-restored PDA cells (blue 
tracks). Ina-c, ChIP-seq was performed once and an independent ChIP was 
performed in which selective genomic regions were confirmed by qPCR. 
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Extended Data Fig. 10 | See next page for caption. 


Article 


Extended Data Fig. 10 | RrebI’” mouse embryo chimaeras exhibit defects in 
early development. a, E7.5 and E8.5 chimeric embryos containing WT ES cells 
or Rreb1’ ES cells were scored, on the basis of gross morphology, as normal or 
mild defects, developmentally retarded or severely abnormal. At E7.5,a 
fraction of Rreb1* ES cell embryos displayed small clumps of cells in the 
amniotic cavity, possibly an artefact from the microinjection, and were 
therefore scored as abnormal. Rreb1” dataare compiled from four distinct KO 
clones. b, Bright-field morphology and mCherry fluorescence (marking 
descendants of injected ES cells) inrepresentative litters of RrebI” ES- 
cell-containing chimeric embryos dissected at E7.5 and E8.5. nc, non-chimeric; 
Ic, low chimaerism. Asterisks mark morphologically abnormal or 
developmentally retarded embryos. c, Bright-field images of morphologically 
abnormal Rreb1“ ES-cell-containing chimeric E8.5 embryos. Embryos 
exhibited abnormal headfold development, including disproportionate 
headfolds (i) and asymmetric headfolds (ii). Axis duplication was also 
observed, (iii) and (iv). Of note, the embryo in (iii) is also developmentally 
retarded. d,e, Confocal maximum intensity projections of whole-mount 


immunostained E8.5 Rreb1” ES-cell-containing chimeric embryos. d, An 
embryo with an ectopic somite-like structure (arrowhead). e, Theembryoinc 
(iv) with axis duplication of the headfolds. f, Sagittal confocal optical sections 
of whole-mount immunostained chimeric E7.5 embryos. Embryos showninf 
(i, ii) have multiple cavities and multiple expression sites of SNAIL, hence 
anterior-posterior axis orientation is not possible. g, Bright-field images of 
morphologically abnormal Rreb1” ES-cell-containing chimeric E7.5 embryos. 
Embryos frequently had protrusions into the cavity and thickening of the 
posterior epiblast, marked by arrowheads. h, i, Confocal maximum intensity 
projections of chimeric embryos after whole-mount immunostaining for 
phospho-histone H3 (h), labelling mitotic cells, and cleaved caspase 3 (i), 
labelling apoptotic cells. Brackets demarcate the primitive streak. j, Sagittal 
confocal optical sections of chimeric E7.5 embryos after whole-mount 
immunostaining for E-cadherin and N-cadherin. Arrowhead, aberrant 
N-cadherin expression. Scale bars, 50 um. Images in b-jare representative of 
two independent experiments. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


4 The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


— A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


— For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


oa For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 
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Software and code 


Policy information about availability of computer code 


Data collection BLI data was acquired using IVIS Spectrum Xenogen instrument (Caliper Life Sciences) using Living Image software v.2.50. Histology data 
for H&E and IHC were acquired using Mirax scanner. qRT-PCR data was acquired using ViiA 7 real-time PCR system, Applied Biosystems. 
Western blot data was acquired using Licor Image Studio V2. RNA-seq, ChIP-seq and ATAC-seq data have been deposited on GEO under 
the accession numbers provided in the methods section. 


Data analysis Statistical analysis: GraphPad Prism v 8.1.2 
Image processing and analysis: ImageJ v 2.0.0 
BLI data analysis: Living Image software v 2.50 
Western blot images and analysis: ImageStudioLite v 5.2.5 
RNA-seq, ChIP-seq and ATAC-seq analysis: FastQC v 0.11.5, GNUparallel v 2.5.2b, STAR v 0.6.1p1, HTSeq v 3.4, DESeq2 v 3.4, Bowtie2 v 
2.3, Samtools v 0.1, R v3.5.0, GSEA v 4.0.0, DAVID v 6.8, MACS v 1.4.2, HOMER v 4.10, Pscan-ChIP v 1.3. Custom code is available on 
request. 
Mass spectrometric data analysis: MaxQuant v 1.5.3.30. 
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upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information 
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Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


The raw and processed RNA-seq, ChIP-seq and ATAC-seq data are deposited on GEO (GSE118765 and GSE128958): 
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118765 
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128958 

The following secure token has been created to allow review of record: 

GSE118765: ersrqismjhypxex 

GSE128958: ynupygawfdefdcp 
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Field-specific reporting 


Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


X] Life sciences Behavioural & social sciences [| Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size For high throughput sequencing experiments, two independent biological replicates were used where possible and to strengthen the 
conclusions of the study. Sample sized for other assays (qPCR, ChIP-PCR, etc.) were chosen based on our prior experience and common 
standards in the field for detecting statistically significant differences between conditions. 


Data exclusions No data was excluded from the studies. 
Replication All attempts at replication were successful. Biological replicates of each experiment is stated under each figure legend and all attempts were 
successful. Moreover, findings were repeatedly reproduced throughout the study: RNA-seq with qRT-PCR in multiple cell lines; protein levels 


with Western blot and immunostaining; cell response assays with different cell line models. 


Randomization — In animal experiments, mice were randomly assigned to each experiment group. 


Blinding Investigators were blinded to cell groups and mouse groups when collecting results. 


Reporting for specific materials, systems and methods 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 


I] Palaeontology 


Animals and other organisms 


Human research participants 


Unique biological materials 


Policy information about availability of materials 


Obtaining unique materials | DNA plasmids: pLVX-Tight-Puro-KrasG12D, pLVX-IRES-Hyg-SMAD4 (Addgene #107128), SGEP (Addgene #111170), pLenti-HA- 
Rreb1, pSpCas9(BB)-2A-GFP (PX458) (Addgene #48138), pSpCas9(BB)-2A-Puro (PX459) (Addgene #48139), pLVX-EF1a-IRES- 


Antibodies 


mCherry (Clontech, Cat #631987). Plasmids generated specifically for this publication are available through authors upon 
request. These materials will also be available through Addgene upon publication of this manuscript. 


Antibodies used 


Validation 


Eukaryotic cell lines 


Antibody name Vendor Catalog # Lot # Dilution 

Rabbit monoclonal anti-E-cadherin Cell Signaling Technology Cat #3195 Lot #13 1:1000 
Rat monoclonal anti-E-cadherin Abcam Cat #ab11512 Lot #GR183496-1 1:1000 

Rat monoclonal anti-E-cadherin Sigma Cat #U-3254 Lot #077M4800V 1:1000 

Rabbit polyclonal anti-ZEB1 Santa cruz Cat #sc-25388 Lot #unknown 1:1000 

Rabbit monoclonal anti-SMAD2/3 Cell Signaling Technology Cat #8685 Lot #6 1:1000 
Rabbit monoclonal anti-HA-Tag Cell Signaling Technology Cat #3724 Lot #unknown 1:1000 
Mouse monoclonal anti-FLAG M2 Sigma Cat #F1804 Lot #SLBT7654 1:2000 

Rabbit polyclonal anti-RREB1 Genway Biotech Cat #GWB-5B0668 Lot #27022 1:500 
Mouse monoclonal anti-a-Tubulin Sigma Cat #16199 Lot #unkown 1:10,000 

Rabbit polyclonal anti-Cleaved Caspase-3 Cell Signaling Technology Cat #9661 Lot #43 1:1000 
Goat polyclonal anti-Brachyury R&D Cat #AF2085 Lot #unkown 1:200 

Rabbit polyclonal anti-phospho-H3 Millipore Cat #06-570 Lot #6570 1:300 

Rabbit polyclonal anti-N-cadherin Santa cruz Cat #sc-7939 Lot #unkown 1:300 

Rabbit polyclonal anti-RFP Rockland Cat #600-401-379 Lot #42393 1:300 

Goat polyclonal anti-SNAIL R&D Cat #AF3639 Lot #XRSO218061 1:100 

Rat monoclonal anti-SOX2 eBioscience Cat #14-9811-82 Lot #2023691 1:200 

Rabbit polyclonal anti-alpha SMA Abcam Cat #ab5694 Lot #GR248336-4 1:1000 

Rabbit monoclonal [Y92] anti-PDGFR beta Abcam Cat #ab32570 Lot #GR212663-49 1:1000 
Mouse monoclonal anti-B-Actin Cell Signaling Technology Cat #3700 Lot #unknown 1:1000 
Donkey anti-rabbit alexa 568 Invitrogen Cat #410042 Lot #1964370 1:500 

Donkey anti-goat alexa 488 Invitrogen Cat #A11055 Lot #830720 1:500 

Donkey anti-rat dylight 650 Invitrogen Cat #SA5-10029 Lot #UF2789721 1:500 

Donkey anti-rat alexa 488 Invitrogen Cat #A21208 Lot #1932496 1:500 

Donkey anti-rabbit alexa 647 Invitrogen Cat #431573 Lot #1903516 1:500 

Donkey anti-rabbit alexa 488 Invitrogen Cat #421206 Lot #2045215 1:500 


Validation statement for each primary antibody is provided on the manufacture's website. 


Policy information about cell lines 


Cell line source(s) 


Authentication 


Mycoplasma contamination 


Commonly misidentified lines 
(See ICLAC register) 


Mouse KSIC PDA cell line was provided by Nabeel Bardeesy. Mouse CIY pancreatic organoid lines were generated from CIY 
mice. Mouse lung epithelial cell line 393T3 cell line was a gift from Taylor Jacks. Other cell lines (human lung epithelial cell 
line A549, mouse mammary gland epithelial cell line NUuMG and mouse embryonic stem cell line E14Tg2a.IV) were 
purchased from ATCC. 


Mouse KSIC PDA cell line, ClY pancreatic organoid lines and 393T3 cell line were genotyped by PCR amplification. A549 cells 
were 100% authenticated with STR profiling. NUUMG and E14Tg2a.IV cells were validated with RNA-seq analysis for 
signature gene expression profile. 


All cell lines tested negative for mycoplasma contamination. 


None of the cell lines used are listed as commonly misidentified lines in the ICLAC database. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals 


Wild animals 


Field-collected samples 


Pdx1-Cre; Cdkn2a -/-;LSL-YFP (CIY) and Pdx1-Cre; KrasG12D+/-; Cdkn2a-/-; Smad4-/- (KSIC) mice were provided by Nabeel 
Bardeesy. FVB/NJ (strain 001800) and B6129SF1/J (strain 101043) mouse strains were obtained from the Jackson Laboratory. 
Athymic nude mice (Hsd:Athymic Nude-Foxn1nu, 069) were obtained from Envigo. For all cancer cell injection studies, female 
mice were used between ages 5-7 weeks of age. All animal experiments were conducted in accordance with protocols approved 
by the MSKCC Institutional Animal Care and Use Committee and were in compliance with the relevant ethical regulations 
regarding animal research. 


The study did not involve wild animals. 


The study did not involve samples collected in the field. 
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ChIP-seq 


Data deposition 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118765 
May remain private before publication. The following secure token has been created to allow review of record GSE118765 while it remains in private status: 
ersrqismjhypxex. 


Files in database submission RAW FILES: 
05_806_S4_Input_R1.combine.fastq.gz 
06_806_S4 SB _SMAD23_R1.combine.fastq.gz 
07_806_S4_ TGFb_S23_R1.combine.fastq.gz 
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08_806_S4 RREB1_Input_R1.combine.fastq.gz 
09_806_S4 RREB1_SB_HA_R1.combine.fastq.gz 
10_806_S4 RREB1_TGFb_HA_R1.combine.fastq.gz 
1_806_S4_RREB1_WT_Input_R1_001.fastq.gz 
1_806_S4_RREB1_WT_Input_R2_001.fastq.gz 
12_806_S4_RREB1_WT_SB_SMAD23_R1_001.fastq.gz 


806_S4_RREB1_WT_TGFb_SMAD23_R1_001.fastq.gz 
806_S4_RREB1_WT_TGFb_SMAD23_R2_001.fastq.gz 
806_S4 RREB1_KO_Input_R1_001.fastq.gz 

806_S4 RREB1_KO_Input_R2_001.fastq.gz 
806_S4_RREB1_KO_SB_SMAD23_R1_001.fastq.gz 
806_S4_RREB1_KO_SB_SMAD23_R2_001.fastq.gz 
16_806_S4_RREB1_KO_TGFb_SMAD23_R1_001.fastq.gz 
16_806_S4_RREB1_KO_TGFb_SMAD23_R2_001.fastq.gz 
33_EB_D3_Input_R1_001.fastq.gz 
33_EB_D3_Input_R2_001.fastq.gz 
34_EB_D3_SMAD23_R1_001.fastq.gz 
34_EB_D3_SMAD23_R2_001.fastq.gz 
35_EB_D3_RREB1_R1_001.fastq.gz 
35_EB_D3_RREB1_R2_001.fastq.gz 
36_ATAC_EB_D3_R1_001.fastq.gz 
36_ATAC_EB_D3_R2_001.fastq.gz 
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PROCESSED DATA FILES: 
02_806_S4_Input_Tags.ucsc.bedGraph.gz 

03_806 _S4 SB SMAD23_Tags.ucsc.bedGraph.gz 
04_806_S4_TGFb_SMAD23_Tags.ucsc.bedGraph.gz 


05_806_S4 RREB1_Input_Tags.ucsc.bedGraph.gz 
06_806_S4 RREB1_SB_HA_Tags.ucsc.bedGraph.gz 
07_806_S4 RREB1_TGFb_HA_Tags.ucsc.bedGraph.gz 


R 
R 
R 
08_806_S4 RREB1_WT_Input.sam_Tags.ucsc.bedGraph.gz 
09_806_S4_RREB1_WT_SB_SMAD23.sam_Tags.ucsc.bedGraph.gz 

R 

R 

R 

R 


10_806_S4 RREB1_WT_TGFb_SMAD23.sam_Tags.ucsc.bedGraph.gz 
11_806_S4 RREB1_KO_Input.sam_Tags.ucsc.bedGraph.gz 
12_806_S4 RREB1_KO_SB_SMAD23.sam_Tags.ucsc.bedGraph.gz 
13_806_S4 RREB1_KO_TGFb_SMAD23.sam_Tags.ucsc.bedGraph.gz 
16_EB_D3_Input_Tags.ucsc.bedGraph.gz 
17_EB_D3_SMAD23_Tags.ucsc.bedGraph.gz 
18_EB_D3_RREB1_Tags.ucsc.bedGraph.gz 
19_ATAC_EB_D3_Tags.ucsc.bedGraph.gz 


Genome browser session https://genome.ucsc.edu/cgi-bin/hgTracks? 
(e.g. UCSC) hgS_doOtherUser=submit&hgS_otherUserName=suj&hgS_otherUserSessionName=Submission 


Methodology 


Replicates All ChIP-seq experiments were performed 1 or 2 times, and confirmed with ChIP-PCR experiment. 

Sequencing depth 50 bp single-end or paired-end sequencing was performed to obtain 30 million read depth 

Antibodies Rabbit monoclonal anti-SMAD2/3 Cell Signaling Technology Cat #8685; Rabbit monoclonal anti-HA-Tag Cell Signaling 
Technology Cat #3724. 

Peak calling parameters Peak calling from ChIP-Seq data was performed with MACS 1.4.2 and verified by HOMER (v4.10). The parameters for peak 


calling included fold change >8, p value < 1e-8 to detect high confidence binding events. Input samples were used as 
reference controls for background correction. 


Data quality Details of data analysis and quality assurance are in the methods section 


Software For QC: FastQC v0.11.5 
For alignment: Bowtie2 
For ChIPseq sorting, normalization, and visualization: Sam Tools & HOMER 


= 
red) 
=F 
Ss 
= 
O 
= 
o 
Nn 
© 
red) 
= 
a) 
=r 
= 
O 
ze) 
S 
iS 
Ze 
=) 
a 
Nn 
(S 
=: 
S 
red) 
5 
< 


Article 


Ananti-CRISPR viral ring nuclease subverts 
type III CRISPR immunity 


https://doi.org/10.1038/s41586-019-1909-5 


Received: 24 July 2019 


Malcolm F. White™* 
Accepted: 14 November 2019 


Januka S. Athukoralage’, Stephen A. McMahon’, Changyi Zhang*“, Sabine Griischow', 
Shirley Graham’, Mart Krupovic’, Rachel J. Whitaker*“, Tracey M. Gloster™* & 


Published online: 15 January 2020 


The CRISPR system in bacteria and archaea provides adaptive immunity against 


mobile genetic elements. Type III CRISPR systems detect viral RNA, resulting in the 
activation of two regions of the Cas10 protein: an HD nuclease domain (which 
degrades viral DNA)’? and a cyclase domain (which synthesizes cyclic oligoadenylates 
from ATP)**. Cyclic oligoadenylates in turn activate defence enzymes with a CRISPR- 
associated Rossmann fold domain‘*, sculpting a powerful antiviral response’ ”° that 
can drive viruses to extinction”®. Cyclic nucleotides are increasingly implicated in 
host-pathogen interactions” °. Here we identify a new family of viral anti-CRISPR 
(Acr) enzymes that rapidly degrade cyclic tetra-adenylate (cA,). The viral ring 
nuclease Acrill-1is widely distributed in archaeal and bacterial viruses and in 
proviruses. The enzyme uses a previously unknown fold to bind cA, specifically, anda 
conserved active site to rapidly cleave this signalling molecule, allowing viruses to 
neutralize the type III CRISPR defence system. The AcrIll-1 family has a broad host 
range, as it targets cA, signalling molecules rather than specific CRISPR effector 
proteins. Our findings highlight the crucial role of cyclic nucleotide signalling in the 
conflict between viruses and their hosts. 


Previously, we identified in the archaeon Sulfolobus solfataricus a 
family of cellular enzymes—referred to hereafter as the CRISPR-asso- 
ciated ring nuclease 1 (Crn1) family—that degrades cA, molecules and 
deactivates the cA,-dependent RNase Csx1". This enzyme is thought 
to act by mopping up cA, molecules in the cell without compromising 
the immunity provided by the type III CRISPR system. In the absence 
of sucha mechanism to remove cyclic oligoadenylates (cOAs) follow- 
ing the clearance of viral infections, cells could be pushed towards 
dormancy or cell death under inappropriate circumstances’™. 
Unsurprisingly, viruses have responded to the threat of the CRISPR 
system by evolving a range of anti-CRISPR (Acr) proteins, which are 
used to inhibit and overcome the cell’s CRISPR defences using a vari- 
ety of mechanisms (reviewed in ref. °). Acrs have been identified for 
many of the CRISPR effector subtypes, and number more than 40 
families”®. 

Here we investigate the DUF1874 protein family, whichis conserved 
and widespread ina variety of archaeal viruses and plasmids, bacte- 
riophages and prophages (Extended Data Fig. 1), for an Acr function. 
Structures are available for several members of the DUF1874 family, 
including gp29 from Sulfolobus islandicus rod-shaped virus 1 (SIRV1)” 
and B116 from Sulfolobus turreted icosahedral virus (STIV)"®. The struc- 
tures reveal an intriguing dimeric structure, with a large central pocket 
flanked by conserved residues. B116 is also known to be important for 
normal virus replication kinetics, as deletion of the gene results ina 
marked ‘small plaque’ phenotype”, consistent with an Acr function. 


DUF1874 is a type III anti-CRISPR, AcrIllI-1 


To investigate a possible Acr function of DUF1874, we deleted the genes 
for the type I-A CRISPR system in Sulfolobus islandicus M.16.4, so that 
it had only a type III-B system for defence” (Extended Data Fig. 2). We 
challenged this strain with the archaeal virus SSeV (Fig. 1a), alytic virus 
isolated from Kamchatka, Russia, that has an exact CRISPR-spacer 
match of 100% in M.16.4, as well as several other potentially active 
CRISPR spacers. SSeV lacks a duf1874 gene and failed to form plaques 
onalawn of S. islandicus M.16.4 with type III-B CRISPR defence unless 
the effector gene csx1 was deleted (Fig. 1a and Extended Data Fig. 2). 
However, the same cells expressing the SIRV1gp29 gene froma plasmid 
were readily infected, giving rise to plaque formation. These data are 
consistent with the hypothesis that SIRVI gp29 functions as an Acr 
specific for the type III CRISPR defence. 

Toexplore this possibility further, we used a recently developed recom- 
binant type III CRISPR system from Mycobacterium tuberculosis; this 
system allows the effector protein downstream of cOAs to be swappedin 
order to provide effective immunity based on either cA, or cA, signalling” 
(Fig. 1). We then transformed strains capable of cA,- or cA,-based immu- 
nity with a plasmid that was targeted for interference owing to amatch 
inits tetracycline-resistance gene to a spacer in the CRISPR array. We 
observed efficient interference (lack of plasmid transformation) after one 
day for either strainin the absence of the duf1874 gene from bacteriophage 
THSA-485A (Fig. 1c, d). However, the presence of the duf1874 gene onthe 
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Fig. 1| DUF1874 is an anti-CRISPR protein specific for cA, signalling. a, SSeV 
infection assay, showing that gp29 (a duf1874 gene from SIRV1) can neutralize 
the type III-B CRISPR system inS. islandicus. We challenged S. islandicus 
RJWOO7Atype I-A or RJWOO7Atype I-AAcsx1 mutant strains with SSeV, in the 
presence or absence of duf1874 (SIRV1gp29) expressed ona replicative 
plasmid. Plaques were observed when csx1 was deleted, or when the resistant 
strain expressed duf1874 (n =3 biological replicates) (Extended Data Fig. 2d). 
b, Diagram showing the recombinant M. tuberculosis type III-A CRISPR 
interference system established in £. coli. By swapping the native ancillary 
nuclease Csm6 for Csx1, the system can be converted from cA,- to cA,- 
mediated antiviral immunity. c, Plasmid transformation assay (after one day’s 
growth), using a plasmid with a match toa spacer in the CRISPR array. If the 
plasmid is successfully targeted by the CRISPR system, fewer transformants are 
expected. Plasmids with or without the duf1874 gene were targeted 
successfully when cA, (Csm6)-mediated antiviral signalling was active. By 
contrast, cells using a cA, (Csx1)-based system reduced transformation only 
when duf1874 was not present, suggesting that DUF1874 was effective in 
neutralizing cA,-based CRISPR interference. The control strain lacked cOA- 
dependent ribonucleases. These results are representative of two biological 
replicates, with four technical replicates each (n=8).d, Colony counts for 
transformants visible after one and four days’ growth in the presence or 
absence of DUF1874 and the indicated effector proteins. DUF1874 antagonizes 
Csx1- but not Csm6-mediated immunity. Data are mean and s.d. fromtwo 
biological replicates with four technical replicates each (n=8). 


plasmid reduced immunity for cA,-mediated, but not cA,-mediated, 
CRISPR defence. This observation supports the hypothesis that DUF1874 
acts asan Acr against cA,-mediated type III CRISPR defence. We therefore 
propose the collective name Acrill-1 for this family. The ‘’ in place of the 
subtype reflects the fact that Acrlll-1 will inhibit any type III CRISPR sub- 
type that utilizes cA, molecules for defence”. We also found that, after 
four days of growth, Csm6-mediated immunity was lost, regardless of the 
presence of DUF1874. This could indicate that alternative mechanisms 
exist to remove cA, (Fig. 1d and Extended Data Fig. 3). 


Acrill-1 degrades cA, rapidly 

To explore the mechanism of action of the Acrlll-1 family, we cloned 
and expressed two family members in Escherichia coli: the SIRV1 gp29 
protein and the YddF protein, encoded by an integrative and conjuga- 
tive element (ICE), Bs1, from Bacillus subtilis”? (Extended Data Fig. 1b). 
We found that both proteins possess potent ring nuclease activity, 
rapidly degrading cA, to generate linear di-adenylate (ApA>P) with a 
cyclic 2’,3’-phosphate (Fig. 2and Extended Data Fig. 4). Withacatalytic 
rate exceeding 5 min“, the Acr enzyme is at least 60-fold more active 
than the cellular ring nuclease Crn1 from S. solfataricus. Both SIRV1 
gp29 and YddF showastrong preference for cA, over CAg, with the latter 
being degraded very slowly by comparison (Extended Data Fig. 4). We 
showed previously that the type III-D CRISPR effector of S. solfatari- 
cus generates cA, in proportion to the amount of cognate target RNA 
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Fig. 2| AcrilI-lrapidly degrades cA, tolinear products. a, Liquid 
chromatography/high-resolution mass spectrometry analysis confirms that 
Acrlll-1SIRV1gp29 converts cA, to A,>P and A,-P. The experiment was repeated 
twice with similar results. RT, retention time. b, Kinetic comparison of cA, 
degradation by the Acrlll-1enzymes SIRV1 gp29 and YddF and the cellular ring 
nuclease Crn1. Values and error bars reflect means + standard deviation (n=3 
technical replicates). c, The left panel shows the experimental protocol. Onthe 
right, the top panel shows activation of Csxlina coupled assay containing type 
III Csm complex activated with the indicated amounts of (unlabelled) target 
RNA to initiate cA, synthesis. The control (C) reaction comprises Csxland 
substrate RNA alone. Eachset of three lanes thereafter is first inthe absence 
and then inthe presence of a Crn1 protein (Sso2081) or an AcrilI-1 protein (SIRV1 
gp29). Whereas AcrIll-1 degraded all cA, molecules generated using up to50nM 
of the target RNA, the Crnl enzyme deactivated Csx1 only when less than5nM 
RNA was used. The bottom panel shows thin layer chromatography (TLC) of the 
same reactions to visualize cA, production and degradation. Csx1 deactivation 
correlated with complete cA, degradation (n=3 technical replicates). For gel 
source data, see Supplementary Fig. 1. 


present™. By varying the target RNA input and following cA, levels and 
Csx1activity, we compared the abilities of Crn1 and Acrlll-1to destroy 
the signalling molecule and deactivate the ancillary defence nuclease 
Csx1. In keeping with its low turnover number, Crn1 was effective at 
degrading cA, and thus deactivating Csx1 only at the lowest levels of 
target RNA (Fig. 2c). By contrast, Acrlll-1 degraded cA, completely 
at the highest target RNA concentration examined, preventing Csx1 
activation. We investigated the ability of each enzyme to prevent Csx1 
activation over a range of cA, concentrations spanning four orders of 
magnitude (Extended Data Fig. 4e). Crn1 (2 1M) provided protection 
only upto5 uMcA,, but 2 uM of AcriIll-1 provided complete protection at 
the highest level of cA, tested (S00 pM). Thus, AcrIll-1 has the potential 
to destroy large concentrations of the second messenger cA, rapidly, 
preventing activation of Csx1. 


Structure and mechanism of AcrlIllI-1 

The structure of AcrIll-lis unrelated to that of proteins with the CRISPR- 
associated Rossmann fold (CARF) domain—the only protein family 
known thus far to bind cOA®. To elucidate the mechanism of cA, binding 
and cleavage by AcrIll-1, we co-crystallized an inactive variant (H47A) 
of SIRV1 gp29 with cA,, and solved the structure to 1.55 A resolution 
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Fig. 3 | Structure of AcrIlI-1 bound to cA,. a, Superimposition of the apo SIRV1 
gp29 structure (salmon) and the same protein in complex with cA, (purple), 
highlighting the movement of the loop and o-helix upon cA, binding. cA, is 
shown coloured by element. b, Surface representation of the structure of SIRV1 
gp29 (purple) in complex with cA,, emphasizing the complete burial of the 
ligand. c, Surface representation of the apo structure of SIRV1 gp29 (salmon) 
with cA, in the position observed inthe structure of the complex, indicating 
that the binding site is preformed. d, Structure of cA, bound to SIRV1 gp29. The 


(Fig. 3 and Extended Data Table 1). The complex reveals a molecule of 
cA, bound at the dimer interface. Comparison of the cA,-bound and apo 
structures reveals a substantial movement of aloop (comprising resi- 
dues 82-94) and subsequent a-helix to bury cA, within the dimer. These 
loops adopt variable or unstructured conformations in the various apo 
protein structures. Once bound, the ligand is completely enclosed by 
the protein—a considerable accomplishment when one considers the 
relative sizes of protein and ligand (Fig. 3b). Superimposition of the 
cA, ligand on the apo-protein structure reveals that the binding site is 
largely preformed, with the exception of the mobile loops that form 
the lid (Fig. 3c). The overall change is like two cupped hands catching 
a ball, with the loops (fingers) subsequently closing around it. 

The cA, molecule makes symmetrical interactions with each mono- 
mer of Acrlll-1 (Extended Data Fig. 5). Arginine R85 on the loop from 
one monomer interacts with the distant half of the cA, molecule and 
appears to ‘lock’ the closed dimer. Other important interactions are 
made with main-chain L92, 169 and N8, and side-chain R66, N8, Q81, S11, 
T50, S49 and N13, most of which are semi or fully conserved (Extended 
Data Figs. 1, 5), suggesting that they haveimportant rolesin cA, binding 
and/or catalysis in this whole family of enzymes. At two positions, on 
opposite sides of the ring, the 2’-hydroxyl of the ribose is positioned 
correctly for in-line attack on the phosphodiester bond, consistent 
with the observed bilateral cleavage (Fig. 3d). The catalytic power of 
the Acrlll-1 family probably derives from active-site residues that posi- 
tion the 2’-hydroxyl group for in-line nucleophilic attack, stabilize 
the transition state and protonate the oxyanion leaving group”. For 
the AcrIll-1 family, the absolutely conserved residue H47 is suitably 
positioned to act as a general acid and fulfil the latter role (Fig. 3d). To 
test this hypothesis, we assayed variant H47A of Acrilll-1. The variant 
enzyme suffered a more than 2,500-fold decrease in catalytic power, 
which could be partially reversed by chemical rescue with 500 mM imi- 
dazole inthe reaction buffer (Extended Data Fig. 6). We also noted that 
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two active-site histidine residues (H47A and H47A’, from each monomer of the 
dimer; modelled on the basis of the position of the alanine side chain inthe 
H47A variant crystallized with cA,, and coloured to represent residues from 
different monomers) are in suitable positions to act as the general acid, 
protonating the oxyanion leaving group. The corresponding ribose sugars 
have 2’-hydroxyl groups suitably positioned for in-line nucleophilic attack on 
the phosphodiester bond. Inthe cA, ligand, carbon atoms are shown in green, 
phosphates in orange, oxygens in red and nitrogens in blue. 


the conserved residue E88, situated on the tip of the loop that covers 
the binding site, is positioned close to the H47 residue of the opposite 
subunit. When mutated to alanine, the catalytic rate was reduced by 
84-fold to 0.064 min™ (Extended Data Fig. 6b), consistent with a role 
for E88 in positioning H47 and/or increasing the pKa of the catalytic 
histidine residue to enhance catalysis”. 

By targeting a key signalling molecule, a single AcrIll-1 enzyme should 
have broad utility in the inhibition of endogenous cA,-specific type 
III CRISPR systems in any species. Of the CRISPR ancillary nucleases 
studied to date, most are activated by cA,; activation by cA, appears 
to be limited to certain bacterial phyla, including the Firmicutes and 
Actinobacteria”. Recently, a type III Acr (AcrIIIB1) has been reported 
that appears to function by binding and inhibiting the type III-B effector 
complex”®. Two other Acr proteins with enzymatic functions have been 
described: AcrVA1, which catalyses CRISPR RNA (crRNA)-mediated 
cleavage of Cas12a”’, and AcrVAS, which acetylates the site in Cas12a 
that senses the protospacer-adjacent motif (PAM) of target DNA”’. 
These and other Acrs target a protein (or protein/nucleic acid complex), 
implying arequirement for specific interactions that could be evaded 
by sequence variation. This is not a limitation of AcrllI-1. 


Phylogenetic analysis of AcrIlI-1 

The gene encoding Acrlll-lis found in representatives of at least five 
distinct viral families, making it one of the most widely conserved ofall 
archaeal virus proteins” (Extended DataFig.1and Supplementary Data1). 
The distribution of AcrIll-1in archaea is sporadic but covers most of 
the main lineages (Supplementary Data1), and is typically adjacent to 
open reading frames (ORFs) from mobile genetic elements rather than 
CRISPR loci. A good example is the STIV integrated into S. acidocal- 
darius genomes”’. Acrilll-1is also present in several bacteriophages of 
the order Caudovirales, and there are many instances of acrill-1 genes 


in sequenced bacterial genomes, with homologues found in the Firmi- 
cutes, Cyanobacteria, Proteobacteria, Actinobacteria and other phyla 
(Supplementary Data 1). Maximum likelihood phylogenetic analysis 
of the Acrlll-1 proteins suggests multiple horizontal gene transfers 
between unrelated viruses, as well as between bacteria and archaea 
(Extended Data Figs. 7-9). Sometimes the acri//-1 gene is clearly part 
of an integrated mobile genetic element, such as the yddF gene in 
B. subtilis’. However, in other species (n = 49) the gene is associated 
with cellular type III CRISPR systems. In Marinitoga piezophilia, AcrIII-1 
is fused to a COA-activated HEPN RNase of the Csx1 family. Given that 
both active sites are conserved, this fusion protein may have cA,-acti- 
vated RNase activity coupled witha cA,-degradative ring nuclease, thus 
providing an explicit linkage between the AcrIll-1 family and the type 
III CRISPR system. In this context the enzyme is likely to be acting as a 
host-encoded ring nuclease, like Crn1, rather than an Acr. We therefore 
propose the family name of Crn2 (CRISPR-associated ring nuclease 
2) to cover DUF1874-family members that are associated with type III 
CRISPR systems (Extended Data Fig. 8). 


Cyclic nucleotides in defence systems 


Acrlll-lis, to our knowledge, the first Acr to be predicted to have func- 
tional roles in both ‘offense and defence’. It remains to be determined 
whether the acril//-1 gene arose in viruses and was appropriated by cel- 
lular type III systems, or vice versa. However, the extremely broad dis- 
tribution of acril/-1and limited distribution of crn2 suggests the former. 
Adoption of an anti-CRISPR protein as acomponent of acellular CRISPR 
defence system seems counterintuitive. However, the enzyme could 
have been harnessed for a role in defence by putting it under tight tran- 
scriptional control so that it is expressed at appropriate times or levels. 
The unprecedentedly wide occurrence of this Acr across many archaeal 
and bacterial virus families reflects the fact that this enzyme degrades a 
key signalling molecule to subvert cellular immunity. This makes it very 
hard for cells to evolve counter-resistance, other than by switching toa 
different signalling molecule. Recent discoveries have highlighted the 
existence of diverse cellular defence systems involving cyclic nucleotide 
signalling in bacteria” ©. Itis possible that cOAs, and the enzymes that 
metabolize them, have functions that extend beyond type III CRISPR 
systems. The identification here of anew class of cA,-binding proteins 
highlights the potential for further discoveries in this area. 
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Methods 


Construction of S. islandicus strains 
The type I-A CRISPR defence module, which includes seven genes— 
cas3b, csa5, cas7, cas5, cas3’, cas3” and casX*'—was in-frame deleted 
from the genetic host S. islandicus RJjWOO7, derived from wild-type 
strain S. islandicus M.16.4 carrying a double pyrEF and argD deletion™, 
by using a modified plasmid integration and segregation knockout 
strategy*, in line with the methodology developed in ref. **. The result- 
ant type I-A deletion mutant (RJWOO7Atype I-A) was then used as a 
parental strain to further delete the csx/1 gene, generating the mutant 
strain RJWOO7Atype I-AAcsx1. Mutant strains were confirmed by poly- 
merase chain reaction (PCR) analysis using primers that bind outside 
of the homologous flanking arms of genes to be deleted. 
Synthesized SIRV1 gp29 gene was purchased from Integrated DNA 
Technologies (IDT), Coralville, USA as a g-block, and was clonedintoa 
Sulfolobus-E. coli shuttle vector, pSeSd-SsoargD” (referred to as pOE 
hereafter), at the Ndel and Notl sites, generating the gp29 expression 
plasmid pOE-gp29 in which the gp29 gene was placed under the control 
of the arabinose promoter. The pOE-gp29 and pOE plasmids were then 
transformed into competent cells of the Atype I-A mutant and Atype 
I-AAcsx1 mutant via electroporation as described”, generating strains 
expressing and not expressing SIRV1 gp29, respectively. 


Viral quantification 

The genome sequence of SSeV is available in GenBank (accession code 
MN53972). To calculate the titre of SSeV, we co-incubated 100 ul diluted 
virus (10°, 10°° and 107) with 500 pl S. islandicus Y08.82.36 host” (ten- 
fold concentrated) without shaking at 76-78 °C for 30 min. Afterwards, 
wetransferred the virus-infected cells into a glass test tube containing 
5ml of prewarmed sucrose-yeast extract (SY) and 0.8% gelrite mixture, 
and plated onto SY plates. The plates were put into a plastic bag, and 
incubated for two days at 76-78 °C. We counted plaques in plates with 
proper virus dilutions, and determined the titre of SSeV to be 4.96 x 10° 
plaque-forming units (PFUs) per millilitre. 


SSeV infection of S. islandicus with or without type III CRISPR 

We carried out the SSeV infection assay as described”°, with minor modi- 
fications. In brief, approximately 6 x 108 cells of S. islandicus M.16.4 cells 
taken from the exponential stage were spun down at 4,000 r.p.m. for 
12min, and resuspendedin1 ml of arabinose-tryptone (AT) medium. The 
resuspensions were then co-incubated with 20 ml of fresh AT medium 
or SSeV supernatant at different dilutions (10°, 107, 10°, 10°, 10+, 
10° and 10°) ina Falcon tube at 76-78 °C for 1h without shaking. The 
SSeV-infected cells were washed twice with 10 ml of AT medium and 
resuspended into 500 ul of AT medium. Afterwards, the concentrated 
SSeV-infected cells were mixed with 5 ml of top layer (2.5 ml of 2 x ara- 
binose yeast extract (AY) medium plus 2.5 ml of 0.8% gelrite), and then 
plated onto AY plates. PFUs were counted after four days of incubation 
at 76-78 °C. Three independent experiments were performed. 


Cloning and purification 

For cloning, we purchased synthetic genes (g-blocks) from IDT, and 
cloned them into the vector pEhisVSspacerTev between the Ncol and 
BamHI sites®. Competent DH5a (E. coli) cells were transformed with the 
construct, and sequence integrity confirmed by sequencing (Eurofins 
Genomics). The plasmid was transformed into E. coli C43 (DE3) cells 
for protein expression. Cloning of Acrlll-1 SIRV1 gp29, Crn1 Sso2081 
and SsoCsx1 has been described previously”. For expression of SIRVI1 
gp29 and Bacillus subtilis YddF, we grew 21 of Luria-Broth (LB) culture 
at 37 °C to an OD go of 0.8 with shaking at 180 r.p.m. Protein expression 
was induced with 0.4 mM isopropyl B-D-1-thiogalactopyranoside, and 
cells were grown at 25 °C overnight before harvesting by centrifuga- 
tion at 4,000 r.p.m. (Beckman Coulter Avanti JXN-26;JLA8.1 rotor) at 
4 °C for 15 min. 


For protein purification, we resuspended the cell pellet in four vol- 
umes equivalent of lysis buffer containing 50 mM Tris-HCl 7.5, 0.5 M 
NaCl, 10 mM imidazole and 10% glycerol supplemented with EDTA-free 
protease-inhibitor tablets (Roche; one tablet per 100 ml buffer) and 
lysozyme (1 mg mI”). Cells were lysed by sonicating six times for one 
minute with one-minute rest intervals on ice at 4 °C, and the lysate was 
ultracentrifuged at 40,000 r.p.m. (70 Ti rotor) at 4 °C for 35 min. The 
lysate was then loaded ontoa5 ml HisTrap FF Crude column (GE Health- 
care) equilibrated with wash buffer containing 50 mM Tris-HCl pH 7.5, 
0.5 M NaCl, 30 mM imidazole and 10% glycerol. Unbound protein was 
washed away with 20 column volumes of wash buffer, before elution of 
histidine-tagged protein using a linear gradient (holding at 10% for three 
column volumes, and 50% for three column volumes) of elution buffer 
containing 50 mM Tris-HCI pH 7.5, 0.5 M NaCl, 0.5 Mimidazole and 10% 
glycerol. We carried out SDS-polyacrylamide gel electrophoresis (PAGE) 
to identify fractions containing the protein of interest, and pooled and 
concentrated relevant fractions using a10 kDa molecular mass cut-off 
centrifugal concentrator (Merck). The histidine tag was removed by 
incubating concentrated protein overnight with tobacco etch virus 
(TEV) protease (1 mg per 10 mg protein) while dialysing in buffer con- 
taining 50 mM Tris-HCl pH 7.5, 0.5 M NaCl, 30 mM imidazole and 10% 
glycerol at room temperature. The protein with histidine tag removed 
was isolated using a5 ml HisTrapFF column, eluting the protein using 
four column volumes of wash buffer. Histidine-tag-removed protein 
was further purified by size-exclusion chromatography (S200 16/60; 
GE Healthcare) in buffer containing 20 mM Tris-HCl pH 7.5, 0.125 M 
NaCl using an isocratic gradient. After SDS-PAGE, fractions containing 
protein of interest were concentrated and protein was aliquoted and 
stored at -80 °C. We generated variant enzymes using the QuickChange 
site-directed mutagenesis kit as per the manufacturer’s instructions 
(Agilent Technologies), and purified them as for the wild-type proteins. 


Radiolabelled cA,-cleavage assays 

We generated cOA by incubating 120 pg Sulfolobus solfataricus (Sso) 
type III-D (Csm) complex with 5 nM a-P-ATP, 1 mM ATP, 120 nM A26 
RNA target and 2 mM MgCl, in Csx1 buffer containing 20 mM 2-(N-mor- 
pholino)ethanesulfonic acid (MES) pH 5.5, 100 mM K-glutamate, 1mM 
dithiothreitol (DTT) and three units SUPERaseeIn Inhibitor for 2 hat 
70 °Cina100 pl reaction volume. We extracted cOA through phenol- 
chloroform (Ambion) extraction followed by chloroform extraction 
(Sigma-Aldrich), with storage at —20 °C. 

For single-turnover kinetics experiments, we assayed Acrlll-1 SIRV1 
gp29 and variants (4 pM protein dimer) for radiolabelled cA, deg- 
radation by incubating with 1/400 diluted ”P-labelled SsoCsm COA 
(roughly 200 nM cA,, generated in a 100 pl cOA-synthesis reaction 
as above) in Csx1 buffer supplemented with 1 mM EDTA at 50 °C. We 
incubated AcrIll-1 YddF (8 pM dimer) with cOA in buffer containing 
20 mM MES pH 6.0, 100 mM NaCl, 1mM DTT, 1 mM EDTA and three 
units SUPERaseeIn Inhibitor at 37 °C. We incubated Crn1 Sso2081 (4 uM 
dimer) with cOA in buffer containing 20 mM Tris-HCl pH 8.0, 100 mM 
NaCl, 1mM EDTA, 1mM DTT and three units SUPERaseeIn Inhibitor 
at 50 °C. For SIRV1 gp29 H47A chemical rescue, reactions were sup- 
plemented with 0.5 M imidazole. Two experimenters were involved 
in kinetic experiments involving five-second time points. At desired 
time points, a 10 ul aliquot of the reaction was removed and quenched 
by adding to phenol chloroform and vortexing. Subsequently, 5 pl of 
deproteinized reaction product was extracted into 5 11100% formamide 
xylene-cyanol loading dye if intended for denaturing PAGE, or products 
were further isolated by chloroform extraction if intended for thin-layer 
chromatography (TLC). A reaction incubating cOA in buffer without 
protein to the endpoint of each experiment was included as a negative 
control. All experiments were carried out in triplicate. For SIRV1gp29, 
two biological samples were assayed in triplicate. We visualized cA, 
degradation by phosphor imaging following denaturing PAGE (7 M 
urea, 20% acrylamide, 1x Tris/borate/EDTA (TBE)) or TLC. 


For TLC, we spotted 1 pl of radiolabelled product 1cm from the bot- 
tom of a 20 x 20 cm silica gel TLC plate with fluorescence indicator 
254 nm (Supelco Sigma-Aldrich). We placed the TLC plate in a sealed 
glass chamber prewarmed and humidified at 37 °C and containing 
0.5 cm of arunning buffer composed of 30% water, 70% ethanol and 
0.2M ammonium bicarbonate, pH 9.2. The temperature was lowered to 
35 °C and the buffer was allowed to rise along the plate through capillary 
action until the migration front reached 17 cm. The plate was air dried 
and sample migration was visualized by phosphor imaging. 

To examine degradation of cA, and cA, by AcrIll-1 proteins, we incu- 
bated unlabelled cA, or cA, (450 uM, BIOLOG Life Science Institute, 
Bremen, Germany) with SIRV1 gp29 or YddF (40 pM dimer), in reac- 
tion buffers described above, at 70 °C and 37 °C, respectively. Reac- 
tions were quenched at the indicated time points and prepared for 
TLC as above. We visualized reaction substrate and products, which 
block fluorescence of the indicator on the plate, under shortwave UV 
light (254 nm) and photographed the plates using a12-megapixel//1.8- 
aperture camera. 

For kinetic analysis, we quantified cA, cleavage using the Bio-Formats 
plugin” of Image) as distributed in the Fiji package” and fitted the data 
to a single exponential curve (y = m1 + m2*(1- exp(-m3*x)); m1 = 0.1, 
m2=1and m3=1) using Kaleidagraph (Synergy Software), as before**. 
We obtained the cA,-cleavage rate by the H47A variant in the absence 
of imidazole by linear fit. Raw data for kinetic analyses are available 
in Supplementary Data 2. 


Deactivation of HEPN nucleases by ring nucleases 

In the absence or presence of Crn1 Sso2081 (2 uM dimer) or AcrIll-1 
SIRV1 gp29 (2 uM dimer), we incubated 4 pg S. solfataricus Csm com- 
plex (roughly 140 nM Csm carrying crRNA targeting A26 RNA target) 
with A26 RNA target (SO nM, 20 nM, 5 nM, 2 nM or 0.5 nM) in buffer 
containing 20 mM MES pH 6.0, 100 mM NaCl, 1 mM DTT and three units 
SUPERaseeIn Inhibitor supplemented with 2 mM MgCl, and 0.5 mM ATP 
at 70 °C for 60 min. We added 5’-end **P-labelled Al RNA (5’-AGGGUA- 
UUAUUUGUUUGUUUCUUCUAAACUAUAAGCUAGUUCUGGAGA-3’) 
and 0.5 uM dimer SsoCsx1 to the reaction at 60 min, and allowed the 
reaction to proceed for a further 60 min before quenching by adding 
phenol chloroform. We visualized Al RNA cleavage by phosphor imag- 
ing after denaturing PAGE. A control reaction incubating SsoCsx1 with 
A1 RNA in the absence of cOA was carried out to determine SsoCsx1 
background activity. We visualized cA, synthesis by Csm in response 
to A26 target RNA, and subsequent cA, degradation in the presence 
of Crn1 Sso2081 or AcrIll-1 SIRV1 gp29, by adding 5 nM a-”P-ATP with 
0.5 mM ATP at the start of the reaction. Reactions were quenched at 
60 min with phenol chloroform, and cA, degradation products were 
visualized by phosphor imaging following TLC. We also carried out a 
control reaction incubating Csm with ATP and a-”P-ATP in the absence 
of A26 target RNA, quenching the reaction after 60 min. 

We determined the cA,-degradation capacity of AcrIll-1 SIRV1 gp29 
and of the Crni1 enzyme Sso2081 by incubating 2 pM dimer of each 
enzyme with 500-0.5 pM unlabelled cA, (BIOLOG Life Science Institute, 
Bremen, Germany) in Csx1 buffer at 70 °C for 20 min before introducing 
SsoCsx1 (0.5 uM dimer) and ”P-labelled Al RNA (50 nM). The reaction 
was left to proceed for a further 60 min at 70 °C before quenching by 
adding phenol chloroform. Deproteinized products were separated 
by denaturing PAGE to visualize RNA degradation. 


Plasmid immunity from a reprogrammed type lll system 

Plasmids pCsm1-5_ ACsmé6 (containing the type III Csm interference 
genes cas10, csm3, csm4 and csmS from M. tuberculosis and csm2 
from M. canettii), pCRISPR_TetR (containing M. tuberculosis cas6 and 
a tetracycline-resistance-gene-targeting CRISPR array), pRAT-Target 
(tetracycline-resistance plus target plasmid) and M. tuberculosis (Mtb) 
Csm6/Thioalkalivibrio sulfidiphilus (Tsu)Csx1 expression constructs 
have been described previously”. pRAT-Duet was constructed by 


replacing the pUC19 lacZa gene of pRAT-Target with the multiple clon- 
ing sites (MCSs) of pACYCDuet-1 by restriction digest (5’-Ncol, 3’-Xhol). 
The viral ring nuclease (duf1874) gene from Thermoanaerobacterium 
phage THSA_485A, tsac_2833, was PCR-amplified from its pEHisTEV 
expression construct and cloned into the 5’-Ndel, 3’-Xhol sites of MCS-2. 
The cOA-dependent nuclease genes (mtb csm6, tsu csx1) were cloned 
into the 5’-Ncol, 3’-Sall sites of MCS-1 by restriction digest from their 
respective expression constructs. Each nuclease was cloned with and 
without the viral ring nuclease; pRAT-Duet without insert and pRAT- 
Duet containing only the viral ring nuclease were used as controls. We 
carried out the plasmid transformation assay essentially as described”. 
E.coliC43 containing pCsm1-5_ACsm6 and pCRISPR_TetR were trans- 
formed by heat shock with 100 ng of pRAT-Duet target plasmid con- 
taining different combinations of cCOA-dependent nuclease and viral 
ring nuclease. After outgrowth at 37 °C for 2h, cells were collected and 
resuspended in 200 uILB. A series of tenfold dilutions was applied onto 
LB agar containing 100 pg mI‘ ampicillinand 50 pg mI" spectinomycin 
to determine the cell density of the recipient cells and onto LB agar 
additionally containing 25 pg mI tetracycline, 0.2% (w/v) D-lactose and 
0.2% (w/v) L-arabinose to determine the cell density of viable transfor- 
mants. Plates were incubated at 37 °C for 16-18 h; further incubation 
was carried out at room temperature. Colonies were counted manu- 
ally and corrected for dilution and volume to obtain colony-forming 
units (CFUs) per millilitre. Raw data for plasmid counts are available 
in Supplementary Data 3. 


Liquid chromatography/high-resolution mass spectrometry 

We incubated Acrlll-1 SIRV1 gp29 (40 pM dimer) with 400 uM cA, in 
Csx1 buffer for 2 min at 70 °C, and carried out deproteinization by 
phenol-chloroform extraction followed by chloroform extraction. Liq- 
uid chromatography/high-resolution mass spectrometry (LC-HRMS) 
analysis was performed ona Thermo Scientific Velos Pro instrument 
equipped with HESI source and Dionex UltiMate 3000 chromatogra- 
phy system. Compounds were separated ona Kinetex EVO C18 column 
(2.6 ym, 2.1 x 50 mm; Phenomenex) using the following gradient of 
acetonitrile (B) against 20 mM ammonium bicarbonate (A): 0-2 min 
2% B, 2-10 min 2-8% B, 10-11 min 8-95% B, 11-14 min 95% B, 14-15 min 
95-2% B, 15-20 min 2% B, at a flow rate of 300 pl min‘ and column tem- 
perature of 40 °C. UV data were recorded at 254 nm. Mass data were 
acquired on a Fourier transform mass analyser in negative-ion mode, 
with scan range m/z 150-1,500 at a resolution of 30,000. We set the 
source voltage to 3.5 kV, the capillary temperature to 350 °C, and the 
source heater temperature to 250 °C. Data were analysed using Xcalibur 
(Thermo Scientific). 


Phylogenetic analysis 

Acrlll-1 homologues were collected by using gp29 (NP_666617) of 
SIRV1as a query and running two iterations (E=1x 10°) of PSI-BLAST” 
against the non-redundant protein database at the National Center 
for Biotechnology Information (NCBI). Sequences were aligned using 
PROMALS3D*°. Redundant sequences (95% identity threshold) and 
sequences with a mutated active-site residue H47 were removed from 
the alignment. Poorly aligned (low information content) positions 
were removed using the gt 0.2 function of Trimal*. The final alignment 
contained 124 positions. The maximum likelihood phylogenetic tree 
was constructed using PhyML” with automatic selection of the best-fit 
substitution model for a given alignment. The best model identified 
by PhyML was LG +G + I. We assessed branch support using aBayes 
implemented in PhyML, and visualized the tree using iTOL”. 


Crystallization 

The AcrIll-1 H47A variant was concentrated to 10 mg ml", incubated at 
293 K for [hwitha1.2M excess of cA,, and centrifuged at 13,000 r.p.m. 
for 10 min before crystallization. Sitting drop vapour diffusion experi- 
ments were set up at the nanolitre scale using commercially available 
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and in-house crystallization screens and incubated at 293 K. Crystals 
appeared in various conditions, but those used for data collection 
grew from 40% 2-methyl-2,4-pentanediol, 5% polyethylene glycol 
8000 and 0.1M sodium cacodylate, pH 6.5. Crystals were harvested 
and transferred briefly into cryoprotectant containing mother liquor 
with 20% glycerol immediately before cryo-cooling in liquid nitrogen. 
We used the H47A variant to avoid cleavage of the cA, substrate during 
co-crystallization. The position of the active-site histidine was inferred 
from the structure of the apo-protein. 


Data collection and processing 

X-ray data were collected from two crystals at 100 K, at a wavelength 
0.9686 A, on beamline 124 at the Diamond Light Source, to 1.49 A and 
1.60 A resolution. Both data sets were automatically processed with 
Xia2“*, using XDS and XSCALE*. The data were merged in Aimless*° 
and the overall resolution truncated to 1.55 A. The data were phased by 
molecular replacement using Phaser®, witha monomer from PDB file 
2X4 stripped of water molecules as the search model. Model refine- 
ment of AcrIll-1 was achieved by iterative cycles of REFMACS“* in the 
CCP4 suite*? and manual manipulation in COOT™. Electron density for 
cA, was clearly visible in the maximum likelihood/o,-weighted F,.,,— Featc 
electron-density map at 30. The coordinates for cA, were generated in 
ChemDraw (Perkin Elmer) and the library was generated using acedrg™, 
before fitting of the molecule in COOT. Model quality was monitored 
throughout using Molprobity” (score 1.13; centile 99). Ramachandran 
statistics were 98.5% favoured, 0% disallowed. Data and refinement 
statistics are shown in Extended Data Table 1. 


Sample size and randomization 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The structural coordinates and data have been deposited in the Protein 
Data Bank (PDB) with deposition code 6SCF. The genome sequence 
of the SSeV virus has been submitted to GenBank with accession code 
MN53972. Raw data are available in the Supplementary Information 
for the plasmid immunity analysis presented in Fig. 1 and Extended 
Data Fig. 3, and the kinetic analysis presented in Fig. 2 and Extended 
Data Figs. 5, 6. 


31. Held, N. L., Herrera, A. & Whitaker, R. J. Reassortment of CRISPR repeat-spacer loci in 
Sulfolobus islandicus. Environ. Microbiol. 15, 3065-3076 (2013). 

32. Zhang, C. & Whitaker, R. J. Microhomology-mediated high-throughput gene inactivation 
strategy for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Appl. Environ. 
Microbiol. 84, e02167-17 (2017). 

33. Zhang, C., Cooper, T. E., Krause, D. J. & Whitaker, R. J. Augmenting the genetic toolbox for 
Sulfolobus islandicus with a stringent positive selectable marker for agmatine 
prototrophy. Appl. Environ. Microbiol. 79, 5539-5549 (2013). 


34. Deng, L., Zhu, H., Chen, Z., Liang, Y. X. & She, Q. Unmarked gene deletion and host-vector 
system for the hyperthermophilic crenarchaeon Sulfolobus islandicus. Extremophiles 13, 
735-746 (2009). 

35. Rouillon, C., Athukoralage, J. S., Graham, S., Gruschow, S. & White, M. F. Investigation of 
the cyclic oligoadenylate signaling pathway of type III CRISPR systems. Methods 
Enzymol. 616, 191-218 (2019). 

36. Linkert, M. et al. Metadata matters: access to image data in the real world. J. Cell Biol. 189, 
777-782 (2010). 

37. Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. 
Methods 9, 676-682 (2012). 

38. Sternberg, S. H., Haurwitz, R. E. & Doudna, J. A. Mechanism of substrate selection by a 
highly specific CRISPR endoribonuclease. RNA 18, 661-672 (2012). 

39. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Res. 25, 3389-3402 (1997). 

40. Pei, J. & Grishin, N. V. PROMALS3D: multiple protein sequence alignment enhanced with 
evolutionary and three-dimensional structural information. Methods Mol. Biol. 1079, 
263-271 (2014). 

41. Capella-Gutiérrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated 
alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973 
(2009). 

42. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood 
phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307-321 (2010). 

43. Letunic, |. & Bork, P. Interactive tree of life (TOL) v4: recent updates and new 
developments. Nucleic Acids Res. 47 (W1), W256-W259 (2019). 

44. Winter, G. xia2: an expert system for macromolecular crystallography data reduction. J. 
Appl. Crystallogr. 43, 186-190 (2010). 

45. Kabsch, W. Xds. Acta Crystallogr. D 66, 125-132 (2010). 

46. Evans, P. R. An introduction to data reduction: space-group determination, scaling and 
intensity statistics. Acta Crystallogr. D 67, 282-292 (2011). 

47. McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658-674 
(2007). 

48. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures 
by the maximum-likelihood method. Acta Crystallogr. D 53, 240-255 (1997). 

49. Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. 
D 67, 235-242 (2011). 

50. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. 
Acta Crystallogr. D 66, 486-501 (2010). 

51. Long, F. et al. AceDRG: a stereochemical description generator for ligands. Acta 
Crystallogr. D 73, 112-122 (2017). 

52. Chen, V.B. et al. MolProbity: all-atom structure validation for macromolecular 
crystallography. Acta Crystallogr. D 66, 12-21 (2010). 

53. Gerlt, J. A. Genomic enzymology: web tools for leveraging protein family sequence- 
function space and genome context to discover novel functions. Biochemistry 56, 
4293-4308 (2017). 


Acknowledgements This work was supported by grants from the Biotechnology and 
Biological Sciences Research Council (BB/SO00313/1 to M.F.W. and BB/RO08035/1 to T.M.G.) 
and by a NASA Exobiology and Evolutionary Biology grant (NNX14AK23G to R.J.W.). We thank 
J. Black and M. Alejandra-Bautista for isolating and characterizing the SSeV virus, and 

R. Wipfler and W. Zhu for technical assistance. 


Author contributions J.S.A. designed experiments and carried out enzyme assays and 
analysis; S.A.M. carried out structural biology; C.Z. constructed the S. islandicus strains and 
performed virus infection assays; Sabine Gruschow carried out plasmid transformation assays 
and mass spectrometry; Shirley Graham generated expression plasmids and purified proteins; 
M.K. contributed to the conception of the project and performed phylogenetic analysis; T.M.G., 
R.J.W. and M.F.W. oversaw the work, analysed the data and wrote the manuscript. All authors 
contributed to data analysis and writing. 


Competing interests The University of St Andrews has filed a patent application (UK Patent 
Application 1902256.5, “Novel enzyme for phage therapy”; filed 19 February 2019), on which 
J.S.A. and M.FW. are inventors. The other authors declare no competing interests. 


Additional information 

Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019- 
1909-5. 

Correspondence and requests for materials should be addressed to T.M.G. or M.F.W. 

Peer review information Nature thanks Joseph Bondy-Denomy and John van der Oost for their 
contribution to the peer review of this work. 

Reprints and permissions information is available at http://www.nature.com/reprints. 


a 


SIRV1_gp29 1 MNKVYLAI FS INM--LTKFPTKVVIDKIDRLEFCEN- ---IDNED 4 
STIV_B116 1 MGKVFLT| FS INM--LKEFPTTITIDKLDEEDFCLKLELRLEDGT “4 
AFV3-109 1 MLYILINSAILPL--KPGEEYTVKAKEITIQEAKEL----VTKEQ % 
ARV1_gp13 1 MLYIL QITP- --FEGAQATFVERRIDVNEAKKI----VNSQP 37 
SIFV_118 1 MLYILINSATLPL--KPGKEYVIHAKELTIEEAKEL- ---LENER 9 
SMV4_113 1 MTVYLA FSPSM- -LNKLPSAVEFQRVDQKEFCEA----IHHG- % 
ATV_gp06 1 MGVWSVVLYLLINITLIVP- - - FRDERAKFEIERVSAEEAKKI1IQ- -MHNSQ 45 
Thermoanaerobacterium phage THSA-485A_ 1 MFIA/ FSLQM- -LSQFPAHIDIEEVATSAVAKL-------- D 33 
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#228 y N. Sie et 
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Synechococcus phage S-CBWM1 93 FISTI PSSAHILETLTGFPFEAC EG 
Fusobacterium phage Fnu1 3 FISYI ATAEIISILLGTEVP | 
Hydrogenobaculum phage 1 47 FTSAYV| EATAKVLSELLGVEISF | 
ICEBs1 Yddf 4 YKSFI KS TAQFLQKLLGIRIEQ IR 
Crenothrix polyspora 9% FTSAI ASAEMLARLLAMDI PV) IAI TMEAGDRALILRL-LQRILP 
.GH NR we R 
SIRV1_gp29 9 KILTL-EEILKLYESGKVQFFEIIVD 14 
STIV_B116 % KVLSD-KEIKDMYRQGKISFYEVW 116 
AFV3-109 8 VVVKTTEELEKI----GYELWLFEIQ 109 
ARV1_gp13 87 VVLDE-QAIRNI----GFEIVVIERVS 108 
SIFV_118 8 KVIKTVEELEQI ----GYNIWLFEVVTYEHNVKYE 18 
SMV4_113 8 KVLSA-GEVQKAYDEGKVLLLKAIIGK 113 
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Synechococcus phage S-CBWM1 e2 KILDE-HEIYKI -GFSFRKMTYVLGKIPTAPD 
Fusobacterium phage Fnu1 8 QILLTIQEIEEI----GYEFQLLERKN 107 
Hydrogenobaculum phage 1 9 KVLSE-EELRQL----DFDLVLSRVS 110 
ICEBs1 Yddf 8 VLLTQ-RDLEKA- -RYQFYLLTRLD 104 
Crenothrix polyspora 85 KVLNH - HEMMAT-- - - PFELALLTKLK 106 


Extended Data Fig. 1| Multiple sequence alignment of DUF1874-family 
members, and purity of DUF1874 and CRISPR ancillary enzymes used in 
biochemical assays. a, This multiple sequence alignment includes the Acrlll-1 
proteins from the archaeal viruses SIRV1, STIV, AFV3, ARVI, SIFV, SMV4 and 
ATV, the ICEBs1 protein YddF from B. subtilis, the bacteriophage proteins from 
Thermoanaerobacterium phage THSA-485A, Synechococcus phage S-CBWM1, 
Fusobacterium phage Fnuland Hydrogenobaculum phage 1, and the Crn2 


protein from Crenothrix polyspora. Conserved residues H47, R66, R85 and E88 
are indicated by asterisks. Light and dark grey shading indicate regions of 
partial and strong sequence conservation, respectively. b, SDS-PAGE of SIRV1 
gp29 (wild-type, H47A and E88A variants), YddF, the Crnl enzyme Sso2081, and 
the Csx1 enzyme Sso1389. The gel is representative of two or more biological 
replicates. 
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Extended Data Fig. 2| Construction of RJ]WO07 AtypeI-A and RJWO07 Atype 
I-AAcsx1 mutant strains. a, Genomic context of the CRISPR system inthe 
genetic host (S. islandicus RJ}WOO7) and in mutant strains. Aland A2 denote two 
different CRISPR arrays, the orientations of which are indicated with arrows. 
b, PCR verification of Atype I-A mutants. A representative Sulfolobus 
transformant with integrated type I-A knockout plasmid was grown in dextrin- 
tryptone liquid medium, and the cell cultures were plated on dextrin-tryptone 
plates containing 5-fluoroorotic acid (5-FOA, 50 pg mg”), uracil (20 pg mI), 
and agamatine (1 mg ml”). Seven randomly selected 5-FOA-resistant (5-FOA®) 
colonies were screened using the primers that bind outside of the flanking 
homologous regions to confirm the type I-A deletion. A representative Atype 
I-A mutant was further colony purified for subsequent experiments. The 


expected sizes of the PCR products amplified from the genomic DNA of the 
parental strain (referred to wild type, wt) and the Atype I-A mutant are 

8,830 base pairs (bp) and 3,001 bp, respectively. The minus symbol denotes a 
negative control (using water as the template for PCR). L, log-2 DNA ladder 
(NEB). Seven biological replicates were screened. c, PCR analysis of the RIWO0O7 
AtypeI-AAcsx1 mutant and its parental strain RJWOO7 Atype I-A using primers 
that anneal to the outside of the flanking homologous regions of csx1, 
generating amplicons of 2,312 bp and 3,650 bp, respectively. Minus symbol, 
negative control (using water as the template for PCR). L, Gene Ruler Express 
DNA ladder (Thermo Scientific). The experiment carried out once. d, Plaque 
counts for the three strains tested (n =3 biological replicates). 
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Extended Data Fig. 3 | Effect of DUF1874 on plasmid immunity provided bya plates are shown for all replicates (two biological replicates with four technical 
heterologously expressed M. tuberculosis type III-A CRISPR system, replicates each; n=8). Cell-culture dilutions are indicated above the plates. 
providing cA,- or cA,-mediated immunity. Unprocessed images of sample 
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Extended Data Fig. 4| Substrate preference of the Acrill-1 proteins SIRV1 dimer) by the indicated amounts (SOO-0.5 1M) of HPLC-purified cA,, and its 
gp29 and YddF, and effective range of cA, degradation. a—d, TLC images subsequent deactivation when either AcrIlI-1 or Crn1 (2 uM dimer) was present 
visualizing (under 254 nm UV light) cA, and cA, (450 uM) degradation by SIRV1 to degrade cA,. The AcrIll-1 enzyme degraded 100-fold more cA, than did Crn1. 
gp29 (a, b) and YddF (c, d) over time (in minutes). Both AcrIll-1 enzymes display The control reaction (C) shows RNA incubated with Csx1in the absence of cA, 
aclear preference for cA, over cA,. All TLC images are representative of three (n=3 technical replicates). For gel source data, see Supplementary Fig. 1. 
technical replicates. e, Denaturing PAGE showing activation of Csx1 (0.5 1M 
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Extended Data Fig. 5 | Structure of SIRV1 gp29 bound tocA,. a, b, Orthogonal respectively. Conserved residues (Extended Data Fig. 1) in the AcrIII-1 family are 
views of SIRV1 gp29 dimer in complex with cA,. The protein monomers are indicated and discussed inthe text. c, Interactions between each monomer of 
coloured purple and gold, with catalytic residue H47 from the apo structure the SIRV1 dimer (orange and blue), with cA, shown in green. d, Diagram showing 
shownin salmon. cA, is shownasa spacefill model, with green, blue, red and the interaction between SIRV1gp29 and cA,. Dotted lines represent hydrogen 


orange representing carbon, nitrogen, oxygen and phosphorus atoms, bonds, with distances annotated. Spheres represent water molecules. 


Article 


a 0.5 M imidazole 
SIRV1 gp29 H47A (4 uM dimer) SIRV1 gp29 H47A (4 uM dimer) 
CA, cAy 
centri centrl 


150’ 10’ 20’ 40’ 60’ 80’ 100’120'150’ 150’ 10’ 20’ 40’ 60’ 80’ 100’120’ 150’ min 


A>P > ee Se 


A>| &Oeeeeeeee 


A,-P > 


com) OOONOELLL LELLALLAS 


wt 


5 
= in-1 
rs k = 5.4 min 
+ 
s 0.8] E88A 
uw k = 0.064 min: H47A + 0.5 M imidazole 
(eo) k = 0.019 min 
Cc 
oO 06 
+ 
O 
© 
Pas 
LL 
0.4 
0.2 - 
H47A 
k = 0.0025 min" 


0 20 40 60 80 100 120 140 160 
Time (min) 


Extended Data Fig. 6 | Single-turnover cA, cleavage by SIRV1gp29 and 
variants, and chemical rescue with imidazole. a, Phosphorimage of TLC 


visualizing cA, cleavage by SIRV1gp29 H47A (4 uM dimer, 50 °C) in the presence 


or absence of 500 mM imidazole, over time. The rate of cA, cleavage to 
generate A,>P and A,-P was calculated by quantifying densiometric signals 
from the phosphorimage (n =3 technical replicates). b, Plot comparing the 


single-turnover rates of cA, by SIRV1gp29, its ES8A variant and its H47A variant, 
inthe presence or absence of imidazole. Cleavage of cA, by the H47A variant 
can be partially restored when the reaction is supplemented with 500 mM 
imidazole. Data are meanands.d. (n=3 technical replicates). For gel source 
data, see Supplementary Fig. 1. 


oO 
3 
g 
8 : 
= 
eal 
a5 2 Se 
a 8 5 4 5 2 
2882 32 2 
ey ae se 
22s 28 @ e 2 
ee en ee 
3 5s 8 Sep ¢€ 
seee gg 2s 
So 38 a 2 ee 
348 2 g 22 8 $ 
e244 8 2 2s 
ao @ 8 2 ic 8 a 2 5 
O Som ee 1 oS ce 
se Beige ee ge 
e 2 8 seaésé 
Soa@eoeogseeg ¢ ef 2 
oe 
22eaqaqifte & se 
oe 7, = 
322510, 
tray 
~ 13656509 Ber 
0268953, Pllulosiy Ctnivon, 
tr lento, 
iditseg Celly 
~13284866, Ser UCivorang: 
‘we 4 Pentinicelta ay : 
132282683 Natrana Kkaliphigs 
; 1erOVirga hy 
WP_05128 <rolyticas , om 
0391 Anaerovorax odorimutans' ‘vo . ms 
tnrix psekut 
'WP_089965535 Clostridium gasigenes' nae roe 
: . ducens 
‘WP_092468465 Desulfallas arcticus' ero — 
Bpkomyerene "WP_110641154 Thermoplasma sp. Kam2015' 
ia - — euler ‘WP_086033793 Desulfurella amilsii’ 
; "PMP873« , 
swe _111888135 Acetobacterium sp: KB-t we uh 
‘ or 46337 Gam, 
Pere 390 Clostridium ‘Sp. "WP_066 et actotn! 
clostricium rice! ; 098874 Halothiobacin 
soowst864 ssi BES51700 1, a 
5875 host ie 
ip 0200" opti’ 1057145 p zt Terrodiazotropn 
. Obact ae 
097 ye it ~07014 "8 bacterium og 
ge ina oe 98967 “ithiobaciny, 0764 29 
ay 08 4 got — 
is aniio 5; ' 
2 $ 
se x 
oor! or wor 
ES Ro ‘ ne Fg 
Ns 0% ~ ot 
ar er oo os 
co oo or ah 
pe eh 
pee et 3 
oP gg 
¥! 
3° ra s ee Ss. 
ge wer gt a ~ ae 
ee oe a ee Se 
a oe rs Sw KF 9 
we? a er s * os F; 
5 \ S es 
we es * PS e oe Ps 
XM ot of FS ES EF 
oS oe ee Ses 
oe SS SS FSF EF 
oe ot of OS SF ESE: 
wv 3 rw SF SF se ESR 
ev carey Se S gs FS FF EN: 
OE SPSS P TET 3 
Ra S d 8 : 
GOSS £ FP EPP PR ibe ges 48 
& es fo ss Fri gi se bgi eh a's 
Me FFF FFF FTEs es 
- 2 esgsgFPepee oF FSEEE BG 
oes FFs Freer e2stse se eR 
ees FFs FFI tress 3 hs 
Seis PFs aerP years s 
&s ged eggi Pisses 
< # ee 3 3 s 8 g 8 3 
APPA T ELLA Archaea 
S 2 cs & g 2S 2 ig i 
reece sree teas: Bacteria 
gues eee 1 3 8 3s 3% 
ge peat ake +CRISPR 
é &@Ffgsé g§ 3 68 
a ) = 8 gS =€ 
F pope 8a: Archaeal MGE 
5 ‘ 
a 
ea Bacterial MGE 
3. 8 
8 Tree scale: 1 
Extended Data Fig. 7 | Maxi ikeli 
s a 
aa g. ly ximum likelihood phylogeny of AcrIlII-1 number of substitutions per site. B h 
ogues. The maximum likelihood phylogenetic tree was constructed archaea; black, bacteri A areca 
; black, bacteria; blu i i i 
7 . . : 
with automatic selection of the best-fit substitution model fora given homol ee ee ee aenebiagil 
' ] ogues are associated wi i; i 
alignment (LG+G+1). Red circles indicate 95-100% branch support, as 1 i ee ee 
; plasmids; orange, bacteriophages. 


assessed using aBayes implemented in PhyML. The scale bar represents the 


Article 


Crenothrix polyspora 
CARF Cas2 Cas1 Crn2 Csx1 Csm1Csm2 Csm3Csm4Csm5 Cas6 


BOW DDL BD lODPBDDD 


Methylovulum psychrotolerans 


Cas6 Cmr6 Cmr5 Cmr4 Cmr2 Cmr2 Cmr1 Csx1 CARF Cas2 Cas2 Cas1 RT Crn2 


DpDDbDDl___-BDPRbDDDDBD DDD» 


Methylomagnum ishizawai DUF 


CARF Cmr1 Cmr2 Cmr3 Cmr4 Cmr5 Cmr6 1887 Crn2 


SLO LD BDBD BDO 


Thioalkalivibrio sulfidiphilus 
DUF 


CARF Cmr1 Cmr2 Cmr3 Cmr4 Cmr5 Cmr6é CARF 1887 Cas1 Cas2 Csx1 
ia \ ) | = han 


Marinitoga piezophilia 


Csx1- CARF- 
Cas1 Cas2 Ago Crn2 RelE Cas6Cmr6 Cmr5 Cmr4 Cmr2 Cmr2 Cmr1 


|_4 ae 


Extended Data Fig. 8 |Genomic context of crn2 genes in selected bacteria. CRISPRs are indicated by small black arrowheads; and unrelated/hypothetical 
Type lll CRISPR lociin the bacterial species Crenothrix polyspora, genes are shownas small white arrows. The sizes and orientations of genes are 
Methylovulum psychrotolerans, Methylomagnum ishizawai, Thioalkalivibrio not reflected. Ago, Argonaute; CARF, CRISPR-associated Rossman fold; CARF- 
sufidiphilus and Marinitoga piezophilia are shown, with genes labelled and RelE, CARF domain fused to the RelE toxin; DUF1887, predicted CARF nuclease; 


colour coded. The crn2 gene is shown in pale yellow witha bold outline; RT, reverse transcriptase. 


Query UniProt ID: ADAOBOMST3; Candidatus Accumulibacter sp. SK-02; NCBI Taxon ID: 1453999; ENA ID: JDST02000051 
Li ie) '*)DD__BDB> 


< mm) “« 


Query UniProt ID: ADAOBEWXR6; Pyrinomonas methylaliphatogenes; NCBI Taxon ID: 454194; ENA ID: CBXV010000005, 
» ion} = 2 > > | oom | _) 


Query UniProt ID: ADAOH2SHMB; Marinitoga sp. 1155; NCBI Taxon ID: 1428448; ENA ID: AZAX01000015 
D_SDDDODERD)PDDID Db 


Query UniProt ID: AQA0Q0ZXF1; Candidatus Cloacimonas sp. SDB; NCBI Taxon ID: 1732214; ENA ID: LKUH01000343 
“be om) 


Query UniProt ID: AQA101XKN2; Thermocladium sp. ECH_B; NCBI Taxon ID: 1714261; ENA ID: LOBW01000003 
ma __) b = 


4 qq add 


Query UniProt ID: A0A172UBP0; Methylomonas sp. DH-1; NCBI Taxon ID: 1727196; ENA ID: CP014360 
SF 5 (DD DDBDD B)) 5» 


Query UniProt ID: A0A191ZFJ2; Halothiobacillus sp. LS2; NCBI Taxon ID: 1860122; ENA ID: CP016027 
a np Dab) 


nad aa 4 : aX 


Query UniProt ID: A0A189C076; Acidithiobacillus ferrivorans; NCBI Taxon ID: 
DDD) )™! 


((@e (ceK 


: 160808; ENA ID: MASQ01000070 
Lo 


Query UniProt ID: AOA1C4F 161; Gordonia sp. v-85; NCBI Taxon ID: 1761786; ENA ID: FMAX01000019 
= y 2 


4 460% <CGmG@lama@ ai | 


Query UniProt ID: AQA1E7YKH6; Acidithiobacillus caldus; NCBI Taxon ID: 33059; ENA ID: LZYE01000340 
)D)Dm_D) 
1 


Query UniProt ID: AQA1E7YNMB; Acidithiobacillus caldus; NCBI Taxon ID: 33059; ENA ID: LZYE01000130 
DEED) >) 


( 


Query UniProt ID: AQA1J4XCG8; Proteobacteria bacterium CG1_02_64_396; NCBI Taxon ID: 1805333; ENA ID: MNWR01000061 
D>) Do >a») DD > 


Query UniProt ID: A0A1M4VB27; Vibrio gazogenes DSM 21264 = NBRC 103151; NCBI Taxon ID: 1123492; ENA ID: FQUH01000002 
pm) ) a b 


@ a : «aa 


Query UniProt ID: AOA1Q7YGE8; Acidobacteria bacterium 13_1_20CM_3_53_8; NCBI Taxon ID: 1803431; ENA ID: MNJRO1000039 
BD_D BS DD» >) =.) m 
(a Y a 


Query UniProt ID: AOA1Y6D2T3; Methylomagnum ishizawai; NCBI Taxon 760988; ENA ID: FXAM01000001 
is: ») = ab 7 DDE DPD 
i 


Query UniProt ID: AOA256YX31; Candidatus Bathyarchaeota archaeon ex4484_40; NCBI Taxon ID: 2012513; ENA ID: NJDU01000040 


A 


Ca) 


Query UniProt ID: ADA259NGX7; Halothiobacillus sp. 15-55-196; NCBI Taxon ID: 1970382; ENA ID: NCKC01000035 


Query UniProt ID: AQA2G6CVSO; Proteobacteria bacterium; NCBI Taxon ID: 1977087; ENA ID: PDPE01000048 
DED Bd > 
( Gaia Cat 


Query UniProt ID: A0A3M1DDB7; Candidatus Parcubacteria bacterium; NCBI Taxon ID: 2053309; ENA ID: RFKD01000004 
iw > 


Query UniProt ID: AQA3M1FVQ7; Candidatus Parcubacteria bacterium; NCBI Taxon ID: 2053309; ENA ID: RFJQ01000039 
S= 


Query UniProt ID: AOA418RRPS; Methylococcales bacterium; NCBI ay 2304002; . QVQE01000105 
y 


Query UniProt ID: ABUUV4; Hydrogenivirga sp. 128-5-R1-1; NCBI Taxon ID: 392423; ENA ID: ABHJ01000005 
» 5 |_S4 


b) DD OD wD DD 


ae aa au <4 a 
Query UniProt ID; BEGSH5; Thioalkalivibrio sulfidiphilus (strain HL-EbGR7); NCBI Taxon ID: 396588; ENA ID: CP001339, 
eb DED) >= =m) )>> 
a q 4 


Query UniProt ID: C3MY28; Sulfolobus islandicus (strain M.14.25 / Kamchatka #1); NCBI Taxon ID: 427317; ENA ID: CP001400 
»DDDID | ay} 5 d 


a 4 
Query UniProt ID: H2J4R5; Marinitoga piezophila (strain DSM 14283 / JCM 11233 / KA3); NCBI Taxon ID: 443254; ENA ID: CP003257 
I Spam Dd 
ae a  @ a 


Query UniProt ID: S7ULU1; Desulfococcus multivorans DSM 2059; NCBI Taxon ID: 1121405; ENA ID: ATHJ01000127 
DD BDO = 


Query UniProt ID: V4JQM9; uncultured Thiohalocapsa sp. PB-PSB1; NCBI Taxon ID: 1385625; ENA ID: AVFRO1000098 
> mbm ) b 


3 kbp 


| Acrill-1 (DUF 1874; PF08960) 


i] Cas1 (PF01867) 
B Cas2 (PF09827) 
& Cas3 (HD_6; PF18019) 
eas Cas4 (PF01930) 
ia Cas6 (PF10040) 


ie Csm1_B (PF18211) 
ie Csm2_Ill (PF03750) 


B Csm4_C (PF17953) 
Cmr2 (DUF3692; PF12469) 
& Cmr3 (PF09700) 


i Cmr5 (PF09701) 
yy Cas_GSU0053 (PF09617) 


aa Cas_GSU0054 (PF09609) 

z Csx13, SSO2081-like (Cas_NE0113; PF09623) 
Bz Csy4 (PF09618) 

& CARF (DUF 1887; PF09002) 

ei] CARF (Cas_APE2256; PF09651) 

E CARF, Csm6 (Cas_Cas02710; PF09670) 

& CARF, Csx1-like (Cas_DxTHG; PF09455) 


Ip _ HEPN (PF05168) 


ie RAMPs (PF03787) 


RVT_1 (PF00078) 


| WYL (PF13280) 


Extended Data Fig. 9 | CRISPR-associated AcrIlII-1 homologues. Genomic neighbourhoods were analysed using the enzyme function initiative-genome 
neighbourhood tool (EFI-GNT) against the Pfam profile database*’. Gene annotations are colour coded according to the key at the right. 


Article 


Extended Data Table 1| Data collection and refinement statistics 
for Acrlll-1 in complex with cA, 


Data collection 


AcrIII-1 with cA, 


Space group Pl 
Cell dimensions 
a,b, ¢ (A) 49.8, 51.7, 85.6 
a BY ©) 80.2, 89.7, 83.4 
Resolution (A) 50.63-1.55 (1.58-1.55) * 
Riga, OF Risers 0.12 (0.36) 
I/ol 12.3 (1.7) 
Completeness (%) 98.6 (92.4) 
Redundancy 2.9 (1.8) 
Refinement 
Resolution (A) 84.26-1.55 
No. reflections 113882 
Resin LR wee 0.20 / 0.25 
No. atoms 
Protein 7,365 
Ligand/ion 352 
Water 595 
B-factors 
Protein 20.2 
Ligand/ion 13.3 
Water 30.7 
R.m.s. deviations 
Bond lengths (A) 0.012 


Bond angles ©) 1.64 


*Values in parentheses are for the highest-resolution shell. R, residual factor; |, intensity. 
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DNA replication is a tightly regulated process that ensures the precise duplication of 


the genome during the cell cycle’. In eukaryotes, the licensing and activation of 
replication origins are regulated by both DNA sequence and chromatin features’. 
However, the chromatin-based regulatory mechanisms remain largely 
uncharacterized. Here we show that, in HeLa cells, nucleosomes containing the 
histone variant H2A.Z are enriched with histone H4 that is dimethylated on its lysine 
20 residue (H4K20me2) and with bound origin-recognition complex (ORC). In vitro 
studies show that H2A.Z-containing nucleosomes bind directly to the histone lysine 
methyltransferase enzyme SUV420HI, promoting H4K20mez2 deposition, which is in 
turn required for ORC1 binding. Genome-wide studies show that signals from 
H4K20me2, ORC1 and nascent DNA strands co-localize with H2A.Z, and that depletion 
of H2A.Z results in decreased H4K20me2, ORCI and nascent-strand signals 
throughout the genome. H2A.Z-regulated replication origins have a higher firing 
efficiency and early replication timing compared with other origins. Our results 
suggest that the histone variant H2A.Z epigenetically regulates the licensing and 
activation of early replication origins and maintains replication timing through the 
SUV420H1-H4K20me2-ORCl axis. 


In eukaryotes, DNA-replication origins are first licensed in G1 phase 
by the pre-replication complex’; the licensed origins are then selec- 
tively activated during S phase’. In budding yeast, the ORC recognizes 
autonomously replicating sequences (ARSs) to achieve origin licens- 
ing‘. In metazoans, which lack ARSs, replication origins are determined 
by both DNA sequence and chromatin-associated factors”. Of these 
chromatin features, it has been reported? that the histone modifica- 
tion H4K20me2 is recognized by ORCI. Given the broad distribution 
of H4K20mez2 across the genome’, however, other factors must be 
involved to precisely define the function of H4K20me2 in DNA replica- 
tion. Genome-wide studies have shown that the histone variant H2A.Z is 
also enriched at replication origins”*. However, whether the enrichment 
of H2A.Z has a functional role during DNA replication remained unclear. 


H4K20me2 and ORCI recruitment by H2A.Z 


We first found that knocking down the H2AFZ genes in HeLa cells results 
in cell growth defects (Fig. 1a), but not in apoptosis or senescence 
(Extended Data Fig. 1a, b). Further analysis of H2AFZ-knockdown cells 
revealed a defect in incorporation of the replication marker bromo- 
deoxyuridine (BrdU), along with a decreased proportion of S-phase 


and anincreased proportion of Gl-phase cells (Extended Data Fig. 1c), 
indicating that the cells are arrested at the G1/S boundary. Next, we 
synchronized cells at G2/M phase and induced H2A.Z degradation 
using a knock-in auxin-inducible degron (AID) tag’. Six hours after 
release from G2/M arrest, H2A.Z-depleted cells (with depletion having 
been triggered with the auxin indole-3-acetate, or IAA; Extended Data 
Fig. 1d) showed a lower proportion of S-phase cells (dashed green line) 
and a higher proportion of Gl-phase cells (dashed red line) (Fig. 1b). 
This finding supports the idea that cells become arrested at the G1/S 
boundary after H2A.Z depletion. 

Although we identified some genes that are differentially expressed 
after H2AFZ knockdown (Extended Data Fig. le), we did not find any 
enriched terms relating to cell-cycle regulation. Reverse transcrip- 
tion-polymerase chain reaction (RT-PCR) analysis of genes involved 
in S-phase cell-cycle control (including CCNE1 and CDK2) and DNA 
replication (including ORC1, ORC2, the DNA helicase MCM2, CDC6, 
PCNA and RPA1) revealed that H2AFZ knockdown did not change the 
expression of these genes by very much (Extended Data Fig. 1f). Next, 
using mass spectrometry, we found that the subunits of the pre-rep- 
lication complex, including ORC1, ORC2 and MCM2-7, were enriched 
on H2A.Z nucleosomes compared with H2A nucleosomes (Fig. 1c and 


‘National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China. University of Chinese 
Academy of Sciences, Beijing, China. *Key Laboratory of Infection and Immunity, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China. “Ministry of Education (MOE) Key 
Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China. “Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China. 
°Department of Cell Biology, Tianjin Medical University, Tianjin, China. “Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China. 
®Department of Immunology, School of Basic Medical Sciences, Capital Medical University, Beijing, China. “Institut Curie, PSL Research University, Paris, France. These authors contributed 


equally: Haizhen Long, Liwei Zhang, Mengjie Lv, Zengqi Wen. *e-mail: znhumz@sun$5.ibp.ac.cn; liguohong@sun5.ibp.ac.cn 


576 | Nature | Vol577 | 23 January 2020 


a 2s b 100 
! = \ o 
a4 ne © 8 80 
2 — siH2AZ x 2 3 
o 1 £ 40 
oo 
0+ ae) 
012 3 4 
Time (days) 
c 
B PCNA MCM3 Input Flag-IP 
iia ie Pe H2A H2A.Z H2A H2AZ = Be pines 
z | j.che \/ 7a inne & 90} F=0.0360 
ms. McMe, Be _- oo om Ss 
o 1 Re © 80 
£ - = = ORC2 x 
N 2 70 
zo ‘a es ew Oo 
= Maher": = =| (ees a 
o 1 2 = 504 
H2A (log, (7+0.5) a) ee | Flag H2A  H2A.Z 
e Input Flag-IP Input Flag-IP 
H2A H2A.Z H2A H2AZ shGFP shSUV420H1 _ shGFP_ shSUV420H1 
aT | 
—= — ]H4k20me2 H2A H2A.Z H2AH2A.Z _H2A H2A.Z H2A H2A.Z 
wee) JH 4k20met — ORC1 
ie) a) H420me3 == ee | ee ore 
ee) [ee] SUV 42011 =e |. ; ; = g | Mowe 
[eam] [een Hs36mes ——— =a 
—_— H4 
[ee] [ew | a 
Se ee) Flag 2 oS ee | 


Fig. 1|H2A.Z interacts with the pre-replication complex. a, Analysis of cell 
proliferation (left) and western blots (right) for HeLa cells transfected with 
negative-control short interfering (si) RNA (siNC) or siRNA that targets H2A.Z 
(siH2A.Z). OD45, optical density at 450 nm (an indicator of cell density). b, FACS 
analysis of the cell-cycle progression of control cells (-IAA; solid lines) and IAA- 
treated cells (+IAA; dash lines). The yellow shading shows that the cells were 
arrested at G1 phase and delayed proceeding into S phase after IAA-induced 
H2A.Z depletion. c, The total number of peptides identified fromthe 
immunoprecipitation of Flag-tagged H2A or H2A.Z nucleosomes from three 
independent experiments, plotted as log10(n + 0.5) withjitter. The diagonal 
line represents the threshold of twofold enrichment on H2A.Z nucleosomes. 
d,e, Western blot analysis of ORC1, ORC2 and MCM2 (d) or histone 
modifications and SUV420H1 (e) from immunoprecipitation of Flag-tagged 
H2A or H2A.Z mononucleosomes. f, Mass-spectrum analysis of the H4K20me2 
modification from the samples in Fig. le. The y axis shows the percentage of 
H4K20 peptides that are modified as H4K20mez2. g, Western blot analysis of 
ORC1, ORC2 and MCM2 from immunoprecipitation of Flag-tagged H2A or 
H2A.Z mononucleosomes from cells stably expressing a control short hairpin 
(sh) RNA (shGFP) or shRNA against SUV420HI1 (shSUV420H1). GFP, green 
fluorescent protein. Dataina, bare mean+s.d.:a,n=6 technical replicates; 
b,n=3 biological replicates. Datain fare mean +s.e.m.;n=4 biological 
replicates, two-tailed, paired t-test. Western blots ina, d, e, g were 
independently repeated three times with similar results, and H4 was used asa 
loading control and sample processing control. For gel source data, see 
Supplementary Fig. 1. For the FACS gating strategy, see Supplementary Fig. 3. 


Supplementary Table 1). We confirmed the enrichment of ORC1, ORC2 
and MCM2 by western blotting (Fig. 1d). 

As ORC1 might stabilize the binding of other ORC subunits at origins 
during G1 phase”, we tested the interaction between H2A.Z and ORCI 
using the LacO/Lacl targeting system. However, we did not find a direct 
interaction (Extended Data Fig. 1g), suggesting that a bridge is required 
to recruit ORC1 onto H2A.Z nucleosomes. The bromo adjacent homol- 
ogy (BAH) domain of ORC1 has been reported to specifically recognize 
the H4K20mez2 peptide’, suggesting that ORC1 may be recruited onto 
H2A.Z nucleosomes through H4K20mez2. Indeed, we found that both 
H4K20me2 and SUV420HI were enriched on H2A.Z nucleosomes, in 
both unsynchronized (Fig. le) and G1l-synchronized (Extended Data 
Fig. 1h) cells. Mass-spectrometry analysis of H2A and H2A.Z mononu- 
cleosomes showed that whereas H4K20mez2 is abundant on both H2A 
and H2A.Z nucleosomes, it is relatively more enriched on the H2A.Z 
variant (Fig. 1fand Extended Data Fig. 2a, b). In addition, knockdown of 
SUV420H1alone or of both SUV420H1 and SUV420H2 (which encode the 
two enzymes" that catalyse methylation of H4K20me2) abolished the 
enrichment of ORC1, ORC2 and MCM2 on H2A.Z nucleosomes (Fig. 1g 


and Extended Data Fig. 2c-e). These results suggest that H2A.Z recruits 
ORC1 in an H4K20me2-dependent manner. However, knockdown of 
SUV420H2 alone had little effect on the binding of ORC1 onto H2A.Z 
nucleosomes (Extended Data Fig. 2d, e). 


H2A.Z recruits SUV420HI1 to deposit H4K20me2 


Methyltransferase assays showed that, compared with H2A mononu- 
cleosomes, H2A.Z mononucleosomes greatly enhanced the histone- 
methylation activity of recombinant human SUV420HI (Fig. 2a and 
Extended Data Fig. 3a). Mass-spectrometry analysis of the modifica- 
tion products from the histone-methyltransferase reactions revealed 
that the main product of SUV420H1 activity on H2A.Z nucleosomes 
is H4K20mez2, and that this is much more common on H2A.Z than on 
H2A nucleosomes (Extended Data Fig. 3b). We validated this result 
by western blotting (Extended Data Fig. 3c). It has been reported that 
SUV420HI produces H4K20me2 from H4K20mel in vivo”. Inline with 
this, we found that H2A.Z also promoted the activity of SUV420H1 on 
nucleosomes containing H4K,20mel (with K, being a lysine replaced 
by cystine for chemical modification; Extended Data Fig. 3d). 

Next, we generated four chimaeric mutants of H2A.Z, containing 
regions that had been replaced by the corresponding regions of H2A 
(Extended Data Fig. 3e). When the acidic patch of H2A.Z was substi- 
tuted with the corresponding domain of H2A, the activity of SUV420H1 
was markedly reduced (Extended Data Fig. 3e). The residues D97 and 
$98 in the acidic patch have been reported’*"* to be critical for the 
structural and biological functions of H2A.Z. Our results show that 
SUV420HI has very low methylation activity on H2A.Zpo7y/sogx Mutant 
nucleosomes (Fig. 2b). Using mononucleosome pulldown assays, we 
found that SUV420H1 binds more strongly to H2A.Z nucleosomes 
than to H2A nucleosomes (Fig. 2c). Moreover, mutation of D97 and 
S98 to N97 and K98 in H2A.Z impaired the binding of SUV420H1 to 
nucleosomes (Extended Data Fig. 3f). We confirmed the binding of 
SUV420H1 to H2A.Z nucleosomes in vivo using the LacO/Lacl target- 
ing system (Extended Data Fig. 3g, h). Next, we simulated the binding 
between SUV420H1 and H2A.Z mononucleosomes using structural 
data for SUV420HI (ref. *) and the H2A.Z nucleosome’. We found that 
the R257 and K333 residues of SUV420H1 are important for the interac- 
tion with H2A.Z nucleosomes (Extended Data Fig. 4a, b). Indeed, both 
SUV420HI1,55, and SUV420HI1,,33, mutants could not bind H2A.Z nucle- 
osomesas efficiently as wild-type SUV420HI (Fig. 2d). The methylation 
activity of these two mutants on H2A.Z nucleosomes was also reduced 
(Extended Data Fig. 4c). Together, these data show that residues D97 
and S98 of H2A.Z, and R257 and K333 of SUV420HI, are essential for 
the binding and enhancing activity of SUV420H1. 

A pulldown assay of biotinylated mononucleosomes showed that 
binding of ORCI to H2A.Z nucleosomes is substantially enhanced 
by SUV420HI1-catalysed H4K20me2 (Fig. 2e). Moreover, ORCI binds 
weakly to the histone-methyltransferase products of H2A.Z/H4 x20, 
nucleosomes (Fig. 2f) or H2A.Zpo7n/sogx NUCIeosomes (Fig. 2g). We con- 
firmed that the enrichment of H4K20me2, ORCI, ORC2 and MCM2, 
on H2A.Z nucleosomes is greatly impeded by H2A.Zpo7n/sogx Mutation 
in vivo (Fig. 2h). Using H2A.Z nucleosomes containing H4K,.20me2 
as substrates (Extended Data Fig. 4d), we found that mutations in the 
BAH domain of ORCI, which abolish the interaction between ORCland 
H4K20mez? (ref.*), greatly impaired the interaction between ORCland 
H2A.Z nucleosomes containing H4K,20me2 (Extended Data Fig. 4e). 
These results support the idea that the binding of ORC1 to H2A.Z 
nucleosomes depends on the BAH domain of ORC1and on SUV420H1- 
catalysed H4K20me2. 

To test whether the density of H4K20mez2 has an effect on ORC1 
binding, we first assembled H2A and H2A.Z mononucleosomes con- 
taining zero (unmodified H4), one (heterotypic, 50%) or two (homo- 
typic, 100%) H4 histones with the H4K,20me2 modification. As the 
H4K,.20mez2 density increased, ORC1 bound more strongly to H2A and 


Nature | Vol577 | 23 January 2020 | 577 


Article 


a b Ss c d 
H2A H2A.Z eel © lo yi 
—HeAZ & . 
SUV420H1: Ox 1x 2x Ox 1x 2x VL LP LM LL Wild type R257A K333A 
— Vv v v 
SH SH So Poh oF AP aP 
: : er OOM OOM ON Om 
autograph eee HA autograph | ———— —— f- H4 oe oY YL SS sy sll ils ls 
Commassic (IS ome: <s me | suvaz0n1 2s =——- -  |suvazont 
blue SS | bluc Je Ha —.—————— 63 
as as 
= 10 a 10 a | H4 a re | 4 
23 P<1x104 a P<ixio# |---| 
—— 
= 6) petxt04 = 6 Petxto* 
a a -__—-__—_—— 
a SS a _ | or eae suva20n1 is == | SUV420H1 
B 4 a 4 3 5 
aan 2° 2 E 
= = oe. a es 
oO 0 Oo 0 
1x 2x 1x 2x 
H2A H2A.Z 
Histone 
e a | ee 
SUVA20H1 cay Nene dicted ad g ek tas Input Flag-IP 
‘ HAZ 2 = Se = +o + H2A.Z/H4K2Q0A - - - - + + + + ud ¢ eg 
e SUV420H1 - + - + - + - + SUV420H1 - + - + - + - + Rs Ro & 
YJ vas x oy as 
SAM - - + + = = + + SAM - - + + = = + + ‘ i ( i ( 
| Poor Pa a a aa 
HOAZ ORC1 + + + + + + + + ORC1 + + + + + + + + Vw VLE M LL S 
; & € € 
| mononucleosome £8 -—— = = © @lorc £3 -<—<eCCC =Jorci g3 - ORC1 Snes s| ee OFC! 
as as as == S| a x20mee 
|) ee ee ee ee we we | |< <a me ee ee ee cee ee) 114 al | me me | H4 —— 
Biotin pulldown ORC1 = | —— epee] ORC?2 
' wee | H4K20me2 — H4K20me2 ~ te | H4K20me2 
\ Pad )_-HaK20me2 . Z z mesa [= | Mcm2 
:| a -: z oct Soc SS el 
of | eee | 
mee ees oe eeenee) | ee 


Fig. 2|H2A.Z binds SUV420H1 to promote H4K20mez2 deposition and 
thereby recruit ORC1. a, b, Upper panels, 7H autograph showing 
methyltransferase activity of SUV420HI1 on wild-type H2A or H2A.Z 
mononucleosomes (a) and on mononucleosomes containing H2A.Z point 
mutations (b). Bottom, quantitative analysis of the°H signal by liquid 
scintillation. Mut3 is an H2A.Z mutant whose acid patch is replaced with the 
corresponding region of H2A (Extended Data Fig. 3e). CPM, counts per minute. 
c, Western blot analysis of SUV420H1 from biotin pulldown samples. d, Western 
blot analysis of wild-type and mutant SUV420H1 from mononucleosome biotin 
pulldown samples. e, Left, diagram showing the histone methyltransferase 
activity of SUV420H1 on nucleosomes (top) and the subsequent binding of 
ORC1to the products of SUV420HI's methyltransferase activity, detected by 
biotin pulldown. SAM, S-adenosyl methionine. Right, western blot analysis of 


H2A.Z nucleosomes (Extended Data Fig. 4f). Next, we assembled H2A.Z 
polynucleosomes with an increasing ratio of an H4K,20me2 octamer. 
We found that ORC1 binding was weak when the density of H4K,.20me2 
was 25%, increasing gradually with increasing density of H4K.20me2 
(from 50% to 100%) (Extended Data Fig. 4g). These results suggest that 
ORC1 binds to chromatin inan H4K20me2-dosage-dependent manner. 
Thus, H2A.Z binds SUV420H1 directly to promote H4K20mez2 deposi- 
tion on H2A.Z nucleosomes, and the enhanced H4K20me2 deposition 
is essential for recruiting ORC1 to H2A.Z nucleosomes. 


H2A.Z controls replication origin firing 

To investigate how H2A.Z regulates H4K20me2 deposition and ORC1 
binding at the genome-wide level, we used chromatin immunoprecipita- 
tion with DNA sequencing (ChIP-seq) to map 58,642, 99,574 and 100,917 
peaks for H2A.Z, H4K20me2 and ORCI, respectively, in HeLa cells 
(Fig. 3a). The H4K20me2 ChIP-seq signal was markedly reduced after 
both SUV420H1 and SUV420H2 were knocked down (Extended Data 
Fig. 5a). H4K20mel and H4K20me3 partially overlap with H4K20me2 
(Extended Data Fig. 5b), and their ChIP-seq signal increased slightly at 
the H4K20mez2 peak regions after H2AFZ knockdown (Extended Data 
Fig. 5c). These results validate the specificity of our H4K20me2 ChIP-seq 
data. The peaks of H2A.Z, H4K20me2 and ORC1 overlap highly with each 
other (Fig. 3a), and both H4K20me2 and ORC1 levels correlate positively 
with the H2A.Z level (Extended Data Fig. 5d, e). Moreover, after H2AFZ 
knockdown, the H4K20mez2 and ORCI levels decrease noticeably at 
the H2A.Z peak regions that overlap with both H4K20me2 and ORC1 
(Fig. 3b, c). Mass-spectrometry analysis of H4AK20me2 after H2AFZ 
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the binding of ORC1to SUV420HI1 products. f, Western blot analysis of the 
binding of ORC1to SUV420HI1's histone methyltransferase products, using 
H2A.Z mononucleosomes containing wild-type H4 or mutant H4,59, as 
substrates. g, Western blot analysis of the binding of ORC1to SUV420H1 
products, using wild-type H2A.Z or H2A.Zpo7n/sogk MONONUCIeosomes as 
substrates. h, Western blots showing the distribution of ORC1,ORC2, MCM2 
and H4K20me2 on H2A, H2A.Z and H2A.Zpo7n/sosx MONOnucleosomes.°H 
quantification dataina, bare mean+s.e.m.;n=3 biological replicates; two- 
tailed, unpaired t-test. The °H autograph experiments ina, band western blots 
in c-h were independently repeated three times with similar results. H4 was 
used asa loading control and sample processing control. For gel source data, 
see Supplementary Fig. 1. 


knockdown validates the global decrease in H4K20me2 (Extended 
Data Fig. 5f). These results support the critical role of H2A.Z in regu- 
lating H4K20me2 and ORC1 levels genome wide. Given that ORC1is 
degraded during S phase”, we analysed ORCI binding to chromatinin 
G1/S-phase-arrested cells. The results show that the chromatin fraction 
of ORC1—and levels of SUV420H1, MCM2 and H4K20me2—decreased 
after H2AFZ knockdown (Extended Data Fig. 5g), excluding an effect of 
cell-cycle change on ORCI binding after H2AFZ knockdown. 

To investigate whether H2A.Z regulates the firing of replication ori- 
gins, we mapped the active replication origins in HeLa cells by nascent- 
strand sequencing (NS-seq)"*. We found that treatment with RNase A 
markedly reduced the nascent-strand signal (Extended Data Fig. Sh). 
Genome-wide, we detected 41,850 nascent-strand peaks (normalized 
by the nascent-strand signal following treatment with RNase A), 47.2% 
of whichco-localized with H2A.Z, H4K20me2 and ORC1 simultaneously 
(Fig. 3a). Moreover, this group of nascent-strand peaks had a higher 
read density than other nascent-strand peaks (Fig. 3a). The nascent- 
strand signal also decreased notably after H2AFZ knockdown (Fig. 3b, 
c), suggesting that H2A.Z is essential for origin firing. 

To further analyse the regulatory role of H2A.Z in origin firing, we 
defined the H2A.Z-regulated H4K20mez2 peaks as ‘ZD-K20’ (H2A.Z- 
dependent-H4K20me2) (Extended Data Fig. 5i), and the remaining 
H4K20mez2 peaks as ‘ZI-K20’ (H2A.Z-independent-H4K20me2). We 
also defined ‘ZD-ORC1’ (H2A.Z-dependent-ORC1) and ‘ZI-ORCI’ (H2A.Z- 
independent-ORC1) (Extended Data Fig. 5j). Remarkably, we found 
that 73.3% of ZD-K20 peaks overlap with 45.7% of ZD-ORCI1 peaks, and 
that 70.2% of the nascent-strand peaks overlap with ZD-K20 and ZD- 
ORC1 simultaneously (Fig. 3d). We further defined ZD-ORCI peaks that 
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Fig.3|H2A.Z regulates the recruitment of ORC1to chromatin in order to 
license origins with higher firing efficiency. a, The top Venn diagram shows 
the overlap among H2A.Z, H4K20me2 and ORC1ChIP-seq peaks. The bottom 
Venn diagram shows the overlap between the 31,492 ORCI peaks and nascent- 
strand peaks. The box plot shows the nascent-strand signal in the nascent- 
strand peaks that overlap the H2A.Z, H4K20me2 and ORCI peaks (‘overlap’, 
n=19,774) and the other nascent-strand peaks peaks (‘others’, n= 22,076). 
RPKM, reads per kilobase per million mapped reads. b, Genome tracks show 
the signals of H4K20me2, ORC1and nascent strands. RFPL3, BPIFC and FBO7 are 
RefSeq gene names. c, Heat maps and violin plots show the signals of H2A.Z, 
H4K20me2, ORC1and nascent strands around the centres of H2A.Z peaks that 
overlap both H4K20me2 and ORC1 (n=19,055). d, Venn diagram showing the 
overlap among ZD-K20, ZD-ORC1and nascent-strand regions. e, Box plots 
showing the signals of H2A.Z, H4K20me2, ORC1and nascent strands in plus- 


overlap ZD-K20 and nascent-strand peaks as ‘plus-ZD-ORCI’ (the 20,978 
ZD-ORCI1 peaks in Fig. 3d), which represents those replication origins 
that are regulated by the H2A.Z-SUV420H1-H4K20me2-ORC1 axis, 
and the remaining ZD-ORC1 peaks as ‘minus-ZD-ORCI’. We found that 
H2A.Z, H4K20me2, ORCI and nascent-strand signals were all higher 
around plus-ZD-ORC1 peaks than around minus-ZD-ORC1 or ZI-ORC1 
peaks (Fig. 3e). These results support the essential role of H2A.Z in 
activating replication origins with a higher firing efficiency through 
the SUV420H1-H4K20me2-ORC1 axis. 

Analysis of the genome-wide distribution of plus-ZD-ORC1 showed 
that some of these replication origins were located at promoter regions 
(Extended Data Fig. 5k). In addition, we analysed the DNA sequence 
features in the 20,978 plus-ZD-ORCI peaks. Consistent with previous 
observations””°, we found that G/C-rich and asymmetric A/T-rich 
motifs were significantly (P< 0.01) enriched (Extended Data Table 1), 
indicating cooperative regulation of origin selection and firing by 
genetic and epigenetic elements. 


H2A.Z regulates early-replication origins 


To investigate whether the origins regulated by H2A.Z show any 
preferred timing for replication, we performed BrdU-seq to map 


ZD-ORC1 (n= 20,978), minus-ZD-ORC1 (n=52,655) and ZI-ORC1 (n=27,284) 
regions. f, Working model. Top, origin selection: H2A.Z nucleosomes bind 
SUV420H1 directly (step 1) to establish H4K20me2 on chromatin (step 2), which 
then recruits ORCI (step 3) to bind to replication origins (red mark; step 4). 
Bottom, origin firing: the H2A.Z-SUV420H1-H4K20me2-ORClaxis selectively 
licenses and activates early replication origins. The experiment inb was 
independently repeated two times with similar results. Dataina,c,e were 
analysed by two-tailed Wilcoxon test; P-values are shown in the box plots. For 
the box plots inaande, the centre line represents the median, the box limits are 
the 25th and 75th percentiles, and the whiskers are the minimum to maximum 
values. For the violin plots inc, the box centres represent the median, the box 
limits are the 25th and 75th percentiles, and the upper and lower limits make 
the 95% confidence interval. 


replication timing in HeLa cells. As shown in Extended Data Fig. 6a, 
about 80% of the early-replication domains that we identified overlap 
with those defined previously by BrdU Repli-seq”. We also found that 
plus-ZD-ORCI1 peaks were preferentially enriched at early-replication 
domains, whereas both minus-ZD-ORC1 and ZI-ORCI1 peaks were more 
enriched at late-replication domains (Fig. 4a). Moreover, plus-ZD-ORC1 
peaks were more enriched at the centres of early-replication domains 
than were the other ORCI peaks (Extended Data Fig. 6b). Of note, the 
10-min BrdU signal (for cells labelled for 10 min immediately after being 
released from G1/S arrest) in plus-ZD-ORC1is higher thanin other ORC1 
peaks, and it decreased obviously after H2AFZ knockdown (Fig. 4b). 
These results suggest that the H2A.Z-SUV420H1-H4K20me2-ORC1 
axis preferentially licenses and activates early-replication origins—a 
conclusion supported by real-time PCR analyses of H4K20me2, ORC1 
and nascent strands (Fig. 4c and Extended Data Fig. 6c). Next we ana- 
lysed dynamic changes in replication timing after H2AFZ knockdown, 
finding that, although the replication timing of early-replication 
domains did not change substantially, the timing of late-replication 
regions was advanced (Fig. 4d, e and Extended Data Fig. 6d, e). We 
then arrested H2A.Z-depleted cells at the G1/S boundary. After release, 
H2A.Z-depleted cells progressed through S phase without any defects 
(Extended Data Fig. 7a, b), suggesting that, after H2AFZ knockdown, 
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Fig. 4|H2A.Z regulates early replication origins and replication timing. 

a, Graph showing the replication timing of plus-ZD-ORC1 (n= 20,978), minus- 
ZD-ORC1 (n=52,655) and ZI-ORC1 (n= 27,284) peaks. b, Box plot showing the 
BrdU signal in siNC or siH2A.Z cells. c, Real-time PCR analysis of the ChIP signal 
from H2A.Z, H4K20mez2 and ORCI, and nascent-strand signals in siNC or 
siH2A.Z cells. d, Box plot showing the dynamics of replication timing of the 
2,000 earliest and 300 latest replication origins (nascent-strand peaks). 

e, Genometracks of a relatively late replication domain (red shaded area) show 
the increased BrdU signal at 10 min and advanced replication timing after 
H2AFZ knockdown. Numbers in square brackets indicate the data range of 
corresponding track. MEX3B, C15orf40 and RPS17 are RefSeq gene names. 
Dataincare mean +s.e.m.;n=3 biological replicates; two-tailed unpaired 
t-test. Datain b, d were analysed by two-tailed Wilcoxon test. P-values are 
indicated within b-d. The experiment ine was independently repeated twice 
with similar results. 


replication timing was reprogrammed to ensure that the whole genome 
was replicated efficiently. 

We next investigated whether the R257 and K333 residues of 
SUV420H1 are involved in activating early origins in vivo. We found 
that neither SUV420HI1,,;,, nor SUV420H1,333, could rescue H4K20me2 
after knocking down endogenous SUV420HI1 (Extended Data Fig. 7c). 
In addition, through real-time PCR we found that these two SUV420H1 
mutants could not rescue nascent-strand or BrdU signals (labelled for 
10 minutes immediately after release from G1/S arrest) (Extended Data 
Fig. 7d). Thus, H2A.Z regulates the selection and activation of early- 
replication origins through SUV420H1. 


H2A.Z inactivated T cells 


To study the function of H2A.Z-regulated replication in a more physi- 
ological context, we conditionally knocked out (CKO) H2az1/H2az2 
in T cells by generating CD4“°H2A.Z" mice (Extended Data Fig. 8a). 
We found that the number of mature T cells in the spleen decreased 
dramatically in H2A.Z CKO mice (Extended Data Fig. 8b, c). H2A.Z 
CKO T cells showed reduced BrdU incorporation in the homeostatic 
state, and reduced dilution of carboxyfluorescein succinimidylester 
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(CFSE) in activated T cells on stimulation with CD3 and CD28 antibod- 
ies (Extended Data Fig. 8d-f). In addition, fluorescence-activated cell 
sorting (FACS) analysis showed that activated T cells from CKO mice 
had a prolonged G1 phase and a shorter S phase than the wild type 
(Extended Data Fig. 8g, h). These results suggest an essential role of 
H2A.Z in DNA replication and T-cell proliferation. 

ChIP-seq analysis showed that the H4K20me2 and nascent-strand 
signals were markedly reduced in H2A.Z CKO activated T cells 
(Extended Data Fig. 8i). Western blot analysis confirmed the global 
loss of H4K20me2 upon H2A.Z depletion in activated T cells (Extended 
Data Fig. 8j). We classified nascent-strand signals into ZD-NS (H2A.Z- 
dependent-nascent strand; n=18,382) and ZI-NS (H2A.Z-independent- 
nascent strand; n=7,901). We found that H4K20mez2, as well as H2A.Z 
and nascent strands, in ZD-NS regions were higher than those in ZI-NS 
regions (Extended Data Fig. 8k), indicating that, as with HeLa cells, the 
H2A.Z-SUV420H1-H4K20me2-ORC1 axis has animportant regulatory 
rolein the selection and firing of replication origins in activated T cells. 


Discussion 


We have shown that nucleosomes comprising H2A.Z histones can 
directly bind SUV420H1 to efficiently stimulate the dimethylation of 
H4 K20 residues, thereby licensing and activating early-replication 
origins (Fig. 3f). However, given that H2A.Z nucleosomes are much 
less abundant than canonical H2A nucleosomes in cells, it remains 
unknown how H2A.Z regulates the global abundance of H4K20me2 
in vivo. It has been shown that H4K20mez2 levels are passively diluted 
twofold during DNA replication, recovering gradually by the next G1 
phase”. Notably, H2A.Z is lost on nascent chromatin after DNA replica- 
tion, and is restored along with H4K20mez2 (ref. *”), suggesting that it 
has an essential role in establishing H4K20me2 on newly synthesized 
histones. Recently, a class of cis-regulatory elements called early-rep- 
licating control elements was found to be essential for maintaining the 
timing of early replication”. Another study showed that poly(dA:dT) 
tracts were associated with efficient replication origins”°. Interest- 
ingly, nucleosomes were depleted at the centres of these poly(dA:dT) 
tracts, but strongly positioned at flanking regions”, akin to the features 
of chromatin structure seen at replication origins in yeast”*. We have 
found here that active early origins are highly enriched with G/C-rich 
and asymmetric A/T-rich motifs. Thus, we speculate that coordination 
between genetic determinants and epigenetic features is involved in 
fine-tuning the licensing and activation of replication origins during 
the cell cycle. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Analysis of phenotypes and gene expression after 
H2AFZ knockdown, and analysis of the interaction between H2A.Z and 
ORC1. a, Statistical outcome of FACS analysis of apoptosis of cells treated with 
siNC or siH2A.Z oligonucleotides. Annexinis used as a marker of apoptotic 
cells; DAPIis 4’,6-diamidino-2-phenylindole, anuclear marker. b, Representative 
images of cells undergoing senescence (within boxed regions), shown by 
B-galactosidase staining. c, Cell-cycle analysis of siNC or siH2A.Z HeLa cells. 
Cells were pulse labelled with BrdU and stained with propidium iodide (PI), and 
then analysed by FACS. d, Western blots showing the H2A.Z level when cells are 
released from arrest at G2/M phase after treatment with IAA. AID, auxin- 
inducible degron.e, Volcano plot showing genome-wide expression dynamics 
after HZAFZ knockdown; n=33,835. Genes with alog2(fold change) of lor more 
and P-values of less than 0.01 were selected as differentially expressed genes. 
f, Real-time PCR analysis of gene expression in siNC or siH2A.Z cells. The 
expression level was normalized to that of the glyceraldehyde-3-phosphate 
dehydrogenase gene (GAPDH).g, Left, diagram showing the LacO/Lacl 


targeting system in AO3_1cells. Aninteraction between aLacl-Cherry-tagged 
histone and an enhanced green fluorescent protein (EGFP)-tagged ORC protein 
would result in an overlap of green fluorescence with red fluorescence. Right, 
there is no interaction inthis experiment between ORC1and H2A or H2A.Z. 
Scale bar, 5 um. h, Western blot analysis of the enrichment of H4K20me2, 
H3K36me3 (negative control), SUV420H1, ORC1, ORC2 and MCM2 onH2A or 
H2A.Z nucleosomes from G1-phase-synchronized cells. Data in panel a are 
means; n=2 biological replicates, with dot plots overlaid. Datain panels c, fare 
means +s.e.m.;n=3 biological replicates; two-tailed unpaired t-test. The 
B-galactosidase staining in panel b, the FACS results inc, the western blotsind, 
h, and the fluorescence image in g were independently repeated three times 
with similar results. H4 was used as aloading control and sample processing 
controlindandh. For gel source data, see Supplementary Fig. 1. For imaging 
source data, see Supplementary Fig. 2. For the FACS gating strategy, see 
Supplementary Fig. 3. 
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Extended Data Fig. 2| H2A.Z interacts with ORC1in an H4K20me2- 
dependent manner. a, b, High-performance liquid chromatography (HPLC) of 
unmodified H4 peptide and H4K20mez2 peptide in H2A (a) and H2A.Z (b) 
nucleosomes. c, Western blot analysis showing the signal of SUV420H1 and 
H4K20mel1/2/3 in shGFP cells or shSUV420H1 cells with stably expressed Flag- 
tagged H2A or H2A.Z.d, Real-time PCR analysis shows the level of expression of 
SUV420H1 and SUV420H72in cells transfected with shRNA targeting GFP, 
SUV420HI1 or SUV420H2. The expression level was normalized to that of 
GAPDH. e, Left, enrichment of ORC10n H2A or H2A.Z mononucleosome from 


wild-type or SUV420H2-knockdown cells. Right, enrichment of ORC1lonH2A.Z 
mononucleosomes from wild-type or SUV420H1/2-knockdown cells. Data in 
panel dare means +s.e.m.;n=3 biological replicates; two-tailed unpaired t- 
test. The experimentin panel a was independently repeated four times with 
similar results. The experiments in panels c, e were independently repeated 
twice with similar results. H4 was used as a loading control and sample 
processing control inc ande, respectively. For gel source data, see 
Supplementary Fig. 1. 
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Extended Data Fig. 3 | See next page for caption. 


Extended Data Fig. 3 | H2A.Z enhances the binding of SUV420HI1 to promote 
its enzymatic activity. a, Liquid scintillation results from analysis of SUV420H1 
histone methyltransferase activity, using H2A or H2A.Z mononucleosomes as 
substrates (n =3 biological replicates). b, Mass-spectrometry analysis of 
H4K20me2 modification by SUV420HI1, using H2A or H2A.Z 
mononucleosomesas substrates. c, Western blot analysis of products from 
histone methyltransferase assay of SUV420H1 using H2A or H2A.Z 
mononucleosomesas substrates. IB, immunoblot. d, Left, mass-spectrometry 
analysis of monomethylated and unmethylated H4 histones from chemical 
methylation reactions in vitro. Right, western blot analysis and°H autography 
show that H2A.Z promotes the activity of SUV420H1 on an H4K,20mel1 
substrate. e, Upper panel, diagram showing four chimaeric mutants of H2A.Z, 
with the regions in red replaced with the corresponding regions of H2A. 

The sequences of the region containing D97 and S98 of H2A.Z and the 
corresponding region of H2A are shown below the diagram (in single-letter 
code). Lower panel, ?H autograph and liquid scintillation analysis of the 
methyltransferase activity of SUV420H1 on mononucleosomes containing 


H2A.Zchimaeric mutants. f, Western blot analysis following the pulldown of 
biotinylated mononucleosomes shows an interaction between SUV420H1 and 
mononucleosomes containing wild-type H2A.Z or the H2A.Zpo7n/sogx Mutant. 
g, Theinteraction between SUV420H1 and H2A, H2A.Z or H2A.Zpo7n/sogx WAS 
analysed by LacO/Lacl targeting. Scale bar, 5 um. h, Statistical results from the 
LacO/Lacl targeting assay of panel g, showing the percentage of cells in which 
EGFP-SUV420H1co-localizes with the indicated histones. Data in panelaare 
means +s.d.;n=3 biological replicates. Data in panel b are means; n=2 
biological replicates, with dot plot overlaid. Datain panelse, hare 

means +s.e.m.;n=3 biological replicates; two-tailed unpaired ¢-test. 

The western blots in panels c, d, f, the 7H autography in panels d, e, the 

mass spectrometry in panel dand the fluorescence imaging in panel g 

were independently repeated three times with similar results. H4 was 

used asa loading control and sample processing control in panels c-f. 

For gel source data, see Supplementary Fig. 1. For imaging source data, 

see Supplementary Fig. 2. 
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Extended Data Fig. 4| H4K20me2 dosage dependent interaction between of the interaction between H2A.Z mononucleosomes with the H4K,.20me2 
ORC1and H4K20mez2 nucleosomes. a, b, Docking of the SUV420H1 and H2A.Z modification and ORC1 or ORC1BAH-domain mutants. f, Western blots show 
nucleosome structures shows the interaction between R257 of SUV420H1 and the interaction between ORC1Land H2A or H2A.Z mononucleosomes with 
E64 of H4 (a), and the interaction between K333 of SUV420H1 and D97 of H2A.Z different H4K.20mez2 states. g, Western blots show the interaction between 
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d, Mass-spectrometry analysis of H4 histones with dimethylated K,20, were independently repeated twice with similar results. H4 was used asa 


produced through chemical methylation reactionsinvitro;massspectrometry loading controlin panel c. H3 was used as a loading control in panels e-g. For 
was performed once to confirm the methylation state. e, Western blot analysis gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 5|H2A.Z regulates H4K20me2 and ORCI ona genome- 
wide level. a, Box plot showing the dynamics of H4K20me2 at H4K20me2 peak 
regions (n= 99,574) after knocking down both SUV420H1 and SUV420H2. 

b, Venn diagram showing the overlap between H4K20me1, H4K20me2 and 
H4K20me3 peaks genome wide. c, Heat maps and corresponding box plots 
showing the dynamics of H4K20mel1, H4K20me2 and H4K20me3 at 10-kilobase 
regions around the centres of the H4K20mez2 peaks (n= 99,574). d, e, Dot plot 
showing a positive correlation between H2A.Z and H4K20mez2 (d) and H2A.Z 
and ORC1 (e) at H2A.Z peaks (n=58,642). r, Pearson's correlation coefficient. 

f, Mass-spectrometry analysis of chromatin H4K20me2 abundance after H2AFZ 
knockdown. g, Western blot analysis of ORC1, SUV420H1, MCM2, H4K20mel1, 
H4K20me2 and H4K20me3 in the chromatin fraction from cells arrested at G1 
phase. h, Box plot showing the nascent-strand (NS) signal of RNase-treated and 
untreated samples from the NS peaks (n= 41,850). i,j, Dot plot showing the 


ORC1 RPKM 


dynamics of H4K20mez2 (i; n= 99,574) and ORCI (j;n=100,917) after knocking 
down H2AFZ.k, Genome-wide distribution of plus-ZD-ORC1 peaks (n= 20,978). 
UTR, untranslated region. Data in panels a,c, h were analysed by two-tailed 
Wilcoxon test. Data in panel fare means (n=2 biological replicates) with dot 
plots overlaid. Data in panelsi,j were analysed by one-sided Fisher’s exact test 
without adjustments for multiple comparisons. The western blots in panel g 
were independently repeated twice with similar results, and H4 was used asa 
loading control and sample processing control. For the box plots in panelsa, 

c, the centre lines represent the medians, the box limits are the 25th and 75th 
percentiles, and the whiskers are the minimum to maximum values. For the 
violin plots in panel h, the centres of the boxes represent the medians, the box 
limits are the 25th and 75th percentiles, and the upper and lower limits show the 
95% confidence interval. For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 6|H2A.Z regulates early-replication origins and 
replication timing. a, Venn diagram showing the overlap between early- 
replication domains identified from our BrdU-IP-seq data (n = 3,362) and Repli- 
seq datasets in ref.”!(n=4,727).b, Distribution of ORCI peaks and the 10-min 
BrdU signal (labelled in cells immediately after release from G1/S arrest) in 
length-normalized early-replication domains. c, Real-time PCR analysis of ChIP 
signals from H2A.Z, H4K20me2 and ORCL, and nascent-strand signals in siNC 
or siH2A.Z cells at target 3 (an early replication origin). d, Dot plot showing the 
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correlation between the replication timing of replication domains (n =3,362) 
identified by the 10-min BrdU signal and the dynamics of the 10-min BrdU signal 
after H2AFZ knockdown. r, Pearson’s correlation coefficient.e, Genometracks 
of an early-replication domain show the decreased 10-min BrdU signal after 
H2AFZ knockdown. RT, replication timing. Data in panelc are means +s.e.m.; 
n=3 biological replicates; two-tailed unpaired t-test. The BrdU-seq 
experiments in panels b, e were independently repeated twice with 

similar results. 
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Extended Data Fig. 7|H2A.Z regulates early-replication origins through 
SUV420H1.a, FACS analysis of cell-cycle progression for siNC and siH2A.Z cells 
released from G1/S arrest. b, Statistical results from panela. To quantify cell- 
cycle progression, we normalized the peak DAPI signal of all time points to the 
peak DAPI signal at 0h (G1/S arrest). We defined the DAPI signal at the G1/S 
boundary as value 1; when cells entered G2/M phase, the value is near 2. 

c, Western blot analysis of H4K20me levels in SUV420HI-knockdown cells 
rescued by wild-type or mutant (R257A, K333A) SUV420H1. d, H4K20me2, NS 
and BrdU levels at early origins (targets 1, 3) and late origins (target 2) in 


SUV420H1-knockdown (‘Sh’) cells rescued by wild-type or mutant SUV420H1. 
Datain panels b, d are means +s.e.m.;n=4 biological replicates inb;n=3 
biological replicates in d; two-tailed unpaired t-test. The FACS experiment in 
panel a was independently repeated four times with similar results. The 
western blots in panel c were independently repeated three times with similar 
results. H4 was used as aloading control and sample processing control in 
panel c. For gel source data, see Supplementary Fig. 1. For the FACS gating 
strategy, see Supplementary Fig. 3. 
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Extended Data Fig. 8 | H2A.Zis essential for DNA replication and cell 
proliferation during T-cell activation. a, Diagram showing the construction 
of mice with H2A.Z conditionally knocked out in T cells. b, FACS analysis of 

T cells and B cells from the spleen of wild-type (H2A.Z"") or H2A.Z CKO 
(CD4“°H2A.Z"") mice. c, Statistical analysis of T-cell numbers from panel b. 

d, Left, FACS analysis of BrdU incorporation in CD8' T cells from the lymph 
node of wild-type and H2A.Z CKO mixed bone-marrow chimaeric mice. Right, 
statistical analysis of the percentage of BrdU-incorporating cells. FSC, forward 
scatter. e, FACS analysis of CFSE dilution in wild-type and H2A.Z CKO CD4* 


Western blots showing H4K20mez2 levels in active T cells from wild-type and 
H2A.Z CKO mice. k, Box plots showing H2A.Z, H4K20me2 and NS signals in 
ZD-NS (n=18,382) and ZI-NS (n=7,901) regions. Datain panelsc,d,fandhare 
means +s.e.m.;n=4 biological replicates in panel c; n=5 biological replicates 
in panel d; n=3 biological replicates in panels f, h; two-tailed unpaired f-test. 
The data in panelsi, k were analysed by two-tailed paired t-test and two-tailed 
unpaired t-test, respectively. FACS analyses in panels b, d,e, g were 
independently repeated four, five, three and three times, respectively, with 
similar results. The western blots in panel j were independently repeated three 


Tcells upon anti-CD3 and anti-CD28 stimulation for 72h. f, Statistical analysis 
of the percentage of divided cells from panel e.g, Cell-cycle analysis of wild- 
type and H2A.Z CKOCD4' T cells upon anti-CD3 and anti-CD28 stimulation for 
48 h. PI, propidium iodide. h, Statistical analysis of cell-cycle distribution from 
panel g. i, Heat map and box plots showing the H4K20mez2 and NS signals in 10- 
kb regions around H4K20mez2 peaks (n= 53,788) and NS peaks (n=26,283).j, 


times with similar results, and H4 was used as a loading control and sample 
processing control. For the box plots in panelsi, k, the centre lines represent 
medians, the box limits are the 25th and 75th percentiles, and the whiskers are 
the minimum to maximum values. For gel source data, see Supplementary 
Fig. 1. For FACS gating strategy, see Supplementary Fig. 3. 


Extended Data Table 1| Enriched DNA motifs in H2A.Z-regulated replication origins 


MotifID Pvalue Sites Logo 
motif1 2.3x10% 365 * ¢¢ -s0CCCACCCC. co. 


OL RB Pwo eR Seer AHS REE RS EER Ag 
SeTPTI SERS eae sy 


motif2  2.3x10°° 342 ‘| geeceseeGe.GG6G.- S 


motif3  2.4x10°* 1529 “| eeccCooOeCGe 


A he Bw eKR eS Se hee eS, 


motif4 2.1x10'° 1528 * -ec€,.cCCCCreCc. 


oe nN FF © © RF oe SF EAH ef BSE @ 
ere rrrecea, 


motif5 1x107% 1410 


oO 

Q 
3 
© 
° 
;@) 

© 


se 1 & &€ © © KF ® ®@ S&S = an Om es 


motif6  1.2x10'% 2017 “o GECCCCCeCCC._ 


a> +4 2 Sd 
- Ae <= H OR @AFAOK NH FH © 


| cre0t_ 


To Oo fF 


bits 


motif7  1.7x107° 1922 


motif8  1.1x10%' 1535 #" AAA. AAA 


motif9  5.4x107° 3651 *  aAAA.. AMA.a. 


if10 2.2x10% 1712 34 
re |. tATATA,_ 


. -6 1 
motif11 5.2x10 1387 TTlatatAA - 


[Fe NO TH OR ®@eoran = 


rrrerery 


The sequences of 400-bp genomic regions around the 20,978 plus-ZD-ORC1 peak summits were subjected to regulatory sequence analysis tools (RSATs) with default parameters. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


O A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


Oo For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


[ ] Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Xcaliber 4.1.31.9; OLYMPUS FV1000 Ver.1.7a, Illumina Hiseq2000/Nova seq , BD FACS Diva 8.0.1 


Data analysis Proteome discoverer software 1.4, PEAKS 8.5, Launchpad 2.4, FV Viewer 2.0, GraphPad Prism 6, FlowJo v.10 , Bowtie v2.2.5, bedtools 
v2.17.0, R v3.4.3, MACS v1.4.1, python v2.7.6, tophat v2.2.1, cufflinks v2.2.1, samtools v1.2.1, deeptools v2.3.5, IGV v2.3, PeakSeq v1.3, 
FASTX-Tools v 0.0.13 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


ChIP-seq, NS-seq, BrdU-seq and RNA-seq data have been deposited in Gene Expression Omnibus (GEO) under accession number GSE134988. Fig. 1-2 and Extended 
Data Fig.1-5, 7-8 have associated raw data in Supplementary Fig. 1. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


x Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical methods were used to predetermine sample sizes. Sample sizes were determined based on previous experience. 
Data exclusions No data was excluded from the analysis. 
Replication All experiments were reliably reproduced. Each experiment was performed independently at least two times, but usually many more times. 


Randomization Mice were paired based on gender and age. 


Blinding Blinding was not performed due to the unambiguous nature of measurements and systematic analyses used in these experiments. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used For WB, ChIP, ChIP-seq: 
Anti-H2A.Z, Abcam, ab4174, for WB, ChIP, ChIP-seq; 
Anti-H4, Millipore, #05-858, for WB; 
Anti-Flag, Sigma, s7452,for WB; 
Anti-ORC1, Abcam, ab60, for WB; 
Anti-ORC2, Abcam, ab68348 for WB; 
Anti-MCM2, Abcam, ab4461, for WB; 
Anti-RPA, Abcam, ab79398, for WB; 
Anti-H4K20me1, Abcam, ab9051, for WB and ChIP-seq; 
Anti-H4K20me2, Abcam, ab9052, for WB, ChIP and ChIP-seq; 
Anti-H4K20me3, Abcam, ab9053, for WB and ChIP-seq; 
Anti-Suv420H1, Novus, NBP1-78303, for WB; 
Anti-tublin, Abcam, ab6046, for WB; 
Anti-His, Invitrogen, MA1-21315, for WB; 
Anti-H3, Cell signaling, #9715, for WB. 


For flow cytometry 

1.FFor Surface staining of mouse splenocytes 

CD3 FITC (Clone: 145-2C11; Cat. 11-0031-82; Lot:4338511; eBioscience) 
B220 PE (Clone: RA3-6B2; Cat. 12-0452; Lot: E01249-1634; eBioscience) 
Anti-Brdu FITC (Clone: B44; Cat. 347583; Lot: 6090520; BD) 

Live/Dead (Cat. L34959; Lot: 1921586; invitrogen) 


2. For T cell Isolation and the analysis of proliferation and cell cycle 
CD4 PerCp-Cy5.5 (Clone: RM4-5; Cat. 45-0042; Lot: 4304295; eBioscience) 
CD8 APC (Clone: 53-6.7; Cat. 17-0081; Lot: E07057-1635; eBioscience) 
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CD25 PE (Clone: PC61.5; Cat. 12-0251; Lot: 4277529; eBioscience) 

CD44 APC/Cy7 (Clone: IM7; Cat. 103028; Lot: B262797; Biolegend) 

CD62L AF700 (Clone: MEL-14; Cat. 56-0621-821; Lot: 4306534; eBioscience) 
Anti-mouse CD3e (Clone: 145-2C11; Cat. 100314 ; Lot:B267827; Biolegend) 
Anti-mouse CD28 (Clone: 37.51; Cat. 102112; Lot:B261983; Biolegend) 


Validation all antibodies used are commonly used in the field and have been validated in previous publications/by the manufacturer. 
References and manufacturer validations can be found here: 
Anti-H2A.Z (Abcam ab4174): https://www.abcam.com/histone-h2az-antibody-chip-grade-ab4174.html#top-496 
Anti-H4 (Millipore #05858): https://www.merckmillipore.com/CN/zh/product/Anti-Histone-H4-Antibody-pan-clone-62-141-13- 
rabbit-monoclonal, MM_NF-05-858?ReferrerURL=https%3A%2F%2Fwww.google.com%2F&bd=1 
Anti-H3 (Cell signaling #9715): https://www.cellsignal.com/products/primary-antibodies/histone-h3-antibody/9715 
Anti-H4K20me1 (Abcam ab9051): https://www.abcam.com/histone-h4-mono-methyl-k20-antibody-chip-grade-ab9051.html 
Anti-H4K20me2 (Abcam ab9052): https://www.abcam.com/histone-h4-di-methyl-k20-antibody-chip-grade-ab9052.html 
Anti-H4K20me3 (Abcam ab9053): https://www.abcam.com/Histone-H4-tri-methyl-K20-antibody-ChIP-Grade-ab9053/ 
reviews/45378 
Anti-H3K36me3 (Abcam Ab9050): https://www.abcam.com/Histone-H3-tri-methyl-K36-antibody-ChIP-Grade-ab9050/ 
reviews/61366 
Anti-ORC1 (Abcam ab60): https://www.abcam.com/orc1-antibody-7f61-chip-grade-ab60.html 
Anti-PCNA (Abcam ab29): https://www.abcam.com/pcna-antibody-pc10-ab29.html 
Anti-BrdU (BD Biosciences BD44): https://www.bdbiosciences.com/us/applications/research/apoptosis/purified-antibodies/ 
purified-mouse-anti-brdu-b44/p/347580 
Anti-Flag (Sigma F7452): https://www.sigmaaldrich.com/catalog/product/sigma/f7425?lang=zh&region=CN 
Anti-ORC2 (Abcam ab68348): https://www.abcam.com/orc2-antibody-ab68348.html 
Anti-MCM2 (Abcam ab4461): https://www.abcam.com/mcm2-antibody-ab4461.html 
Anti-RPA (Abcam ab79398): https://www.abcam.com/rpa70-antibody-epr3472-ab79398.html 
Anti-Suv420H1 (Novus, NBP1-78303): https://www.novusbio.com/products/suv420h1-antibody_nbp1-78303 
Anti-Tubulin (Abcam Ab6046): https://www.abcam.com/beta-tubulin-antibody-loading-control-ab6046.html 
Anti-His (Invitrogen MA1-21315): https://www.fishersci.com/shop/products/anti-6x-his-epitope-tag-clone-his-h8/MA121315? 
gclid=CjOKCQjwokzsBRCSARISAITCWXEHNjo-RWL5Ce9YPTUOXgXAPKmQjn- 
y3MVTyALYYMfJCpWeli4DGwkaAmT7EALW_wcB&ef_id=CjOKCQjwoKzsBRC5ARISAITCwXEHNjo-RWL5Ce9YPTUOXGXAPKmQjn- 
y3MVTyALYYMfJCpWgli4DGwkaAmT7EALW_wcB:G:s&cid=SEM_GAW_20190909_CB88IN&ppc_id=FisherSciNonbrand_goog 649 
2004464_80722830231 AntibodiesDSA_b_381598720312_791621048115855394&s_kwcid=AL!4428!3!381598720312!b!!g!! 
Anti-CD3-FITC (eBioscience 11-0031-82 clone 145-2C11): https://www.thermofisher.com/cn/zh/antibody/product/CD3e- 
Antibody-clone-145-2C11-Monoclonal/11-0031-82 
Anti-CD4-PerCp-Cy5.5 (eBioscience 45-0042 clone RM4-5): https://www.thermofisher.com/cn/zh/antibody/product/CD4- 
Antibody-clone-RM4-5-Monoclonal/35-0042-82 
Anti-CD8-APC (Biolegend 17-0081 clone 53-6.7): https://www.thermofisher.com/cn/zh/antibody/product/CD8-alpha-Antibody- 
clone-5H10-Monoclonal/MCD0805 
Anti-CD44-APC/Cy7 (Biolegend 103028): https://www.biolegend.com/en-us/products/apc-cy7-anti-mouse-human-cd44- 
antibody-3933 
Anti-CD62L-AF700 (eBioscience 56-0621-821 Clone MEL-14): https://www.thermofisher.com/cn/zh/antibody/product/CD62L-L- 
Selectin-Antibody-clone-MEL-14-Monoclonal/56-0621-82 
Anti-B220-PE (eBioscience 12-0452 clone RA3-6B2): https://www.thermofisher.com/antibody/product/12-0452-85.html? 
CID=AFLCA-12-0452-85 
Anti-CD25-PE (eBioscience 12-0251-82): https://www.thermofisher.com/cn/zh/antibody/product/CD25-Antibody-clone-PC61-5- 
Monoclonal/12-0251-82 
Live/Dead (invitrogen L34959): https://www.thermofisher.com/order/catalog/product/L34959 
Anti-mouse CD28 (Biolegend 102112): https://www.biolegend.com/en-us/products/purified-anti-mouse-cd28-antibody-117 
Anti-mouse CD3 (Biolegend 100314): https://www.labome.com/product/BioLegend/100314.html 
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Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) HeLa cell line was originally obtained from ATCC. 

Authentication dentity of HeLa cell line was frequently checked by the morphological features, but not authenticated. 

Mycoplasma contamination The cell lines were regulary tested for mycoplasma contamination. And if have contamination, we immediately treated cell 
ines with plasmocin(Invivogen). 

Commonly misidentified lines o commonly misidentified cell lines were used. 


(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals H2A.Z flox mice were purchased from Riken Bioresource Center(RBRCO5765). CD4-Cre transgenic mice were purchased from The 
Jackson Laboratory. H2A.Zflox/flox CD4-Cre mice were generated through crossing H2A.Zflox/flox mice to CD4-Cre transgenic 
mice. All mice were housed under specific pathogen-free conditions in the animal care facilities at the Institute of Biophysics, 
Chinese Academy of Sciences. 


All mice used for experiments were 6-10 weeks-old. Age and sex matched female or male mice were used for each experiment. 


Wild animals No wild animals were involved. 
Field-collected samples No field-collected samples were used. 
Ethics oversight All animal experiments were performed in accordance with the guidelines of the Institute of Biophysics, Chinese Academy of 


Sciences, using protocols approved by the Institutional Laboratory Animal Care and Use Committee. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


ChIP-seq 


Data deposition 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


= 
jad) 
2 
= 
= 
o 
= 
o 
Nn 
© 
red) 
= 
(2) 
=e 
= 
io 
18. 
fo) 
= 
=) 
a 
Wn 
(S 
> 
= 
red) 
5 
< 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134988 
May remain private before publication. 


Files in database submission GSM3983183 hela-siNC-H2A.Z_rep1 
GSM3983184 hela-siNC-H2A.Z_rep2 
GSM3983185 hela-siH2A.Z-H2A.Z_rep1 
GSM3983186 hela-siH2A.Z-H2A.Z_rep2 
GSM3983187 hela-siNC-H4K20me1_rep1 
GSM3983188 hela-siNC-H4K20me1_rep2 
GSM3983189 hela-siNC-H4K20me2_rep1 
GSM3983190 hela-siNC-H4K20me2_rep2 
GSM3983191 hela-siNC-H4K20me3_rep1 
GSM3983192 hela-siNC-H4K20me3_rep2 
GSM3983193 hela-siNC-native-input 
GSM3983194 hela-siH2A.Z-native-input 
GSM3983195 hela-siH2A.Z-H4K20me1_rep1 
GSM3983196 hela-siH2A.Z-H4K20me1_rep2 
GSM3983197 hela-siH2A.Z-H4K20me2_rep1 
GSM3983198 hela-siH2A.Z-H4K20me2_rep2 
GSM3983199 hela-siH2A.Z-H4K20me3_rep1 
GSM3983200 hela-siH2A.Z-H4K20me3_rep2 
GSM3983201 hela-siNC-ORC1_rep1 
GSM3983202 hela-siNC-ORC1_rep2 
GSM3983203 hela-siH2A.Z-ORC1_rep1 
GSM3983204 hela-siH2A.Z-ORC1_rep2 
GSM3983205 hela-siNC-NS_rep1 
GSM3983206 hela-siNC-NS_rep2 
GSM3983207 hela-siNC-Rnase-treated-NS_rep1 
GSM3983208 hela-siNC-Rnase-treated-NS_rep2 
GSM3983209 hela-siNC-genome 


GSM3983210 hela-siH2A.Z-NS 
GSM3983211 hela-siH2A.Z-Rnase-treated-NS 
GSM3983212 hela-siH2A.Z-genome 
GSM3983213 hela-siNC-BrdU-10min_rep1 
GSM3983214 hela-siNC-BrdU-10min_rep2 
GSM3983215 hela-siNC-BrdU-1h 
GSM3983216 hela-siNC-BrdU-3h 
GSM3983217 hela-siNC-BrdU-6h 
GSM3983218 hela-siH2A.Z-10min_rep1 
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GSM3983219 hela-siH2A.Z-10min_rep2 
GSM3983220 hela-siH2A.Z-1h 
GSM3983221 hela-siH2A.Z-3h 

GSM3983222 hela-siH2A.Z-6h 

GSM3983223 hela-siNC-RNA-seq_rep1 

GSM3983224 hela-siNC-RNA-seq_rep2 

GSM3983225 hela-siH2A.Z-RNA-seq_rep1 

GSM3983226 hela-siH2A.Z-RNA-seq_rep2 

GSM3983227 mouse-active-T-cell-siNC-H4K20me2 
GSM3983228 mouse-active-T-cell-siH2A.Z-H4K20me2 
GSM3983229 mouse-active-T-cell-siNC-input 
GSM3983230 mouse-active-T-cell-siH2A.Z-input 
GSM3983231 mouse-active-T-cell-siNC-NS 

GSM3983232 mouse-active-T-cell-siNC-Rnase-treated-NS 
GSM3983233 mouse-active-T-cell-siH2A.Z-NS 


Genome browser session 
(e.g. UCSC) 


Methodology 


Replicates 


Sequencing depth 


Antibodies 


Peak calling parameters 


Data quality 


Software 


GSM3983234 mouse-active-T-cell-siH2A.Z-Rnase-treated-NS 
GSM3993419 hela-WT-input 


Not used 


BrdU, ORC1 and Histone ChIP-seq experiments were performed in duplicates with an input control for each experiment. 
Nascent strand (NS) seq experiments were performed in duplicates with an input control for each experiment. 


hela-WT-H2A.Z_rep1, single-end, read number: 29217798, read length: 50 bp 
hela-WT-H2A.Z_rep2, single-end, read number: 36786000, read length: 50 bp 
hela-H2A.Z-KD-H2A.Z_rep1, single-end, read number: 30506757, read length: 50 bp 
hela-H2A.Z-KD-H2A.Z_rep2, single-end, read number: 30327697, read length: 50 bp 
hela-WT-H4K20me1_rep1, single-end, read number: 27007998, read length: 50 bp 
hela-WT-H4K20me2_rep1, single-end, read number: 46342466, read length: 50 bp 
hela-WT-H4K20me2_rep2, single-end, read number: 21530865, read length: 50 bp 
hela-WT-H4K20me3_rep1, single-end, read number: 35063713, read length: 50 bp 
hela-WT-input_rep1, single-end, read number: 35063713, read length: 50 bp 
hela-WT-input_rep2, single-end, read number: 30813565, read length: 50 bp 
hela-H2A.Z-KD-input_rep1, single-end, read number: 33781291, read length: 50 bp 
hela-H2A.Z-KD-input_rep2, single-end, read number: 31197655, read length: 50 bp 
hela-H2A.Z-KD-H4K20me1_rep1, single-end, read number: 33187537, read length: 50 bp 
hela-H2A.Z-KD-H4K20me2_rep1, single-end, read number: 41079876, read length: 50 bp 
hela-H2A.Z-KD-H4K20me2_rep2, single-end, read number: 22572525, read length: 50 bp 
hela-H2A.Z-KD-H4K20me3_rep1, single-end, read number: 32816918, read length: 50 bp 
hela-WT-ORC1_rep1, single-end, read number: 26780831, read length: 50 bp 
hela-WT-ORC1_rep2, single-end, read number: 27504126, read length: 50 bp 
hela-H2A.Z-KD-ORC1_rep1, single-end, read number: 39982452, read length: 50 bp 
hela-H2A.Z-KD-ORC1_rep2, single-end, read number: 21389952, read length: 50 bp 
hela-WT-NS_rep1, single-end, read number: 22799872, read length: 50 bp 
hela-WT-NS_rep2, single-end, read number: 24639890, read length: 50 bp 
hela-WT-Rnase-treated-NS_rep1, single-end, read number: 18786498, read length: 50 bp 
hela-WT-Rnase-treated-NS_rep2, single-end, read number: 53131336, read length: 51 bp 
hela-WT-genome-NS_rep1, single-end, read number: 16940577, read length: 50 bp 
hela-WT-genome-NS_rep2, single-end, read number: 36229449, read length: 50 bp 
hela-H2A.Z-KD-NS_rep1, single-end, read number: 33595669, read length: 50 bp 
hela-H2A.Z-KD-Rnase-treated-NS_rep1, single-end, read number: 23120958, read length: 50 bp 
hela-H2A.Z-KD-genome-NS_rep1, single-end, read number: 25083887, read length: 50 bp 
hela-WT-BrdU-10min_rep1, single-end, read number: 35791042, read length: 50 bp 
hela-WT-BrdU-10min_rep2, single-end, read number: 23413789, read length: 50 bp 
hela-WT-BrdU-1h_rep1, single-end, read number: 42484612, read length: 50 bp 
hela-WT-BrdU-3h_rep1, single-end, read number: 33748760, read length: 50 bp 
hela-WT-BrdU-6h_rep1, single-end, read number: 21086656, read length: 50 bp 
hela-H2A.Z-KD-10min_rep1, single-end, read number: 41527806, read length: 50 bp 
hela-H2A.Z-KD-10min_rep2, single-end, read number: 24074710, read length: 50 bp 
hela-H2A.Z-KD-1h_rep1, single-end, read number: 47490348, read length: 50 bp 
hela-H2A.Z-KD-3h_rep1, single-end, read number: 25062487, read length: 50 bp 
hela-H2A.Z-KD-6h_rep1, single-end, read number: 19341066, read length: 50 bp 
hela-WT-RNA-seq, pair-end, read number: 41904139, read length: 150 bp 
hela-H2A.Z-KD-RNA-seq, pair-end, read number: 44240416, read length: 150 bp 
mouse-active-T-cell-WT-H4K20me2, single-end, read number: 28935057, read length: 50 bp 
mouse-active-T-cell-H2A.Z-cKO-H4K20mez2, single-end, read number: 24039043, read length: 50 bp 
mouse-active-T-cell-WT-input, single-end, read number: 24703906, read length: 50 bp 
mouse-active-T-cell-H2A.Z-cKO-input, single-end, read number: 28063843, read length: 50 bp 
mouse-active-T-cell-WT-NS, pair-end, read number: 41513960, read length: 150 bp 
mouse-active-T-cell-H2A.Z-cKO-NS, pair-end, read number: 47561656, read length: 150 bp 
mouse-active-T-cell-WT-NS, pair-end, read number: 62444070, read length: 150 bp 
mouse-active-T-cell-H2A.Z-cKO-NS, pair-end, read number: 57631120, read length: 150 bp 


Anti-H2A.Z ,Abcam, ab4174; Anti-H4K20me1, Abcam, ab9051; Anti-H4K20me2, Abcam, ab9052; Anti-H4K20me3, Abcam, 
ab9053; Anti-BrdU, BD, BD44. The ORC1 chip used the M-280 streptavidin dynabead. 


Reads were uniquely mapped to genome using bowtie2. Peaks were called using MACS (--shiftsize=75) and PeakSeq (fdr 
below 0.5%) 


Read with high quality were retained using Fastx_toolkit, unique reads were used for peak calling with FDR below 0.5% 


Bowtie v2.2.5, bedtools v2.17.0, R v3.4.3, MACS v1.4.1, python v2.7.6, tophat v2.2.1, cufflinks v2.2.1, samtools v1.2.1, 
deeptools v2.3.5, IGV v2.3, PeakSeq v1.3, FASTX-Tools v 0.0.13, 
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Flow Cytometry 


Plots 


Confirm that: 


Methodology 


Sample preparation 


Instrument 
Software 


Cell population abundance 


Gating strategy 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 
All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Spleens and lymph node cells from mice were harvested and homogenized. Cells were suspended in FACS buffer(2% FBS in PBS) 
and cell suspensions were filtered through 70um cell strainer. The suspension was washed with cold FACS buffer and centrifuged 
at 500g for Smin. 


LSRFortessa, FACS Aria Ill 
BD FACS Diva 8.0.1 software was used for data collection and FlowJo v.10 was used for data analysis. 


For the spleen and lymph node cells, there are~25% of CD4+ T cell, 20% of CD8+ T cell in single alive lymphocytes. For CD4+ T 
cells, there are ~85% of CD62L+CD44- naive T cells. 

For the spleen lymphocytes, there are~20% of CD3+T cell and ~50% of B220+ B cell in WT mouse, ~10% of CD3+T cell in CKO 
mouse. 
For flow sorting, post-sort cells were analyzed on BD Aria Ill and the purity was at least 95%. 

For the cell cycle, there are ~65% of G1/GO phase cell, ~25% of S phase cell and ~10% of M phase cell in WT group, ~80% of G1/ 
GO phase cell, ~10% of S phase cell and ~10% of M phase cell in H2A.Z KO group. 


For flow sorting, post-sort cells were analyzed on BD Aria IIl and the purity was at least 95%. 


. For T cell analysis and sorting 
-a. For spleen lymphocytes, T cells were gated with CD3 and B cell gated with B220. 

-b. For sorting naive T cells, single alive lymphocytes were first gated with CD4 and CD8, then naive T cells were gated with CD44 
and CD62L, CD44 low and CD62L high cells were resident naive T cells. 

|. For CFSE analysis 
Divied cells: single alive lymphocytes were gated with CFSE+, and CFSE diluted cells are divied cells. 

Il. For cell cycle analysis 

Single alive lymphocytes were gated with Brdu and PI. Brdu low and PI low cells were resident G1/GO phase cells; Brdu high and 
P| middle cells were resident S phase cell cells; Brdu low and PI high cells were resident M phase cell cells. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Corrections & amendments 


Author Correction: 
Weak average liquid- 
cloud-water response to 
anthropogenic aerosols 


https://doi.org/10.1038/s41586-019-1838-3 


Correction to: Nature https://doi.org/10.1038/s41586-019-1423-9 


Published online 31 July 2019 


Velle Toll, Matthew Christensen, Johannes Quaas & Nicolas Bellouin 


A InCOD 


In this Article, a coding mistake occurred when calculating - 


Re 
and a, inwhich COD denotes the cloud optical depth; R, denotés 


the cloud droplet effective radius; LWP denotes the liquid water path; 
and CDNC denotes the cloud droplet number concentration. The 
natural logarithm of those cloud properties was mistakenly taken 
before, instead of after, calculating the track segment average. The 
mistake has been corrected in Figs. 3b, 4, 6 and Extended Data Tables 
2,4 of the original Article, and the incorrect, published figures and 
tables are shown as Figs. 1-5 of this Amendment, for transparency to 
readers. The conclusions of the paper, including the main conclusion 
that changes in cloud water caused by aerosols exert a weak climate- 
warming effect, and all other figures and tables, are not affected. 

The correction has minor effects on the results of the paper, which 
remain qualitatively the same. The radiative forcing exerted by LWP 
adjustments is now estimated at +0.15 W m*”, instead of +0.12 Wm”. 
Consequently, the observed decrease in cloud water now offsets 29%, 
up from 23%, of the global climate-cooling effect caused by aerosol- 
induced increases in the concentration of cloud droplets. In addition, 
the y-axis label in Fig. 4 should be AlInLWP/AInCDNC without a minus 
sign. The original Article has been corrected online. 


Original Fig. 3b 
] 
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— C2 Ship, stratiform, ocean 
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Corrected Fig. 3b 


(= Industry, stratiform, land 


(J Large industry, stratiform 
land 


(— Fires, stratiform, land 
C5 Volcano, stratiform, ocean 


[=] Volcano, trade-wind 
cumulus, ocean 


C2 Ship, stratiform, ocean 


-AlnCOD/AInR,, 


Fig. 1| This figure displays the corrected and the incorrect published Fig. 3b of the original Article. 
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Corrections & amendments 


Original Fig. 4 Corrected Fig. 4 
Relative humidity above clouds (%) Relative humidity above clouds (%) 
0 20 40 60 80 100 0 20 40 60 80 100 
I T T T T T T 
R, (um) R, (um) 
10 15 20 25 10 15 20 25 
T T T T T T 
0.3 —— Cloud-top height 7 0.3 - — Cloud-top height — 
— LWP — LwpP 
0.2 == fs a 0.2 —R, 4 
< — Relative humidity — Relative humidity 
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-AlnLWP/AInCDNC 
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-0.3 - 4 
0 5 
Cloud-top height (km) Cloud-top height (km) 
LWP (g m*) LWP (g m2) 
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Fig. 2| This figure displays the corrected and the incorrect published Fig. 4 of the original Article. 


Original Fig. 6 


Radiative forcing from b Radiative forcing from 
Twomey effect LWP change 


60° N 


30°N 


30°S 


60° E 180° 60° W 60°E 180° 60° W 
-3 -2.5 -2.0 -1.5 -1.0 -0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 
(W m*) (W m*) 
Corrected Fig. 6 
a Radiative forcing from b Radiative forcing from 
Twomey effect LWP change 


60°E 180° 60° W 60°E 180° 60° W 
-3 -2.5 -2.0 -1.5 -1.0 -0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 
(W m*) (W mr) 


Fig. 3| This figure displays the corrected and the incorrect published Fig. 6 of the original Article. 
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Type of track 


Industry St land 
Large industry St land 
Fires St land 
Volcano St ocean 
Volcano Cu ocean 


Ship St ocean 


Type of track 


Industry St land 
Large industry St land 
Fires St land 
Volcano St ocean 
Volcano Cu ocean 


Ship St ocean 


Oriainal Extended Data Table 2 


Number of track 


observations Cloud top height [m] 


6478 2259 
984 2862 
1389 2532 

2521 1311 
831 1471 
793 721 


Corrected Extended Data Table 2 


Number of track 


observations Cloud top height [m] 


6478 2259 
984 2862 
1389 2532 

2521 1311 
831 1471 
793 721 


Relative humidity [%] 


42.6 
34.2 
46.5 
39.7 
76.1 
26.6 


Relative humidity [%] 


42.6 
34.2 
46.5 
39.7 
76.1 
26.6 


AlnLWP/AInCDNC 


-0.071 (0.003) 
-0.215 (0.018) 
-0.088 (0.006) 
-0.107 (0.007) 
+0.051 (0.012) 
-0.021 (0.013) 


AlnLWP/AInCDNC 


-0.098 (0.003 
-0.249 (0.017 
-0.081 (0.006 
-0.109 (0.007 
-0.021 (0.012 

( 


) 
) 
) 
) 
) 
-0.028 (0.013) 


Fig. 4| This figure displays the corrected and the incorrect published Extended Data Table 2 of the original Article. 


Original Extended Data Table 4 


Re interval [ym] 


Re<10.5 -0.115 (0.013) 
10.5<=Re<13.5 -0.140 (0.012) 
13.5<=Re<16.5 -0.068 (0.015) 
16.5<=Re<19.5 -0.004 (0.017) 


19.5<=Re<22.5 
Re>=22.5 


+0.052 (0.033) 
+0.106 (0.035) 


AlnLWP/AInCDNC 
volcano and ship 
tracks over ocean 


AlnLWP/AInCDNC 
industry and fire tracks 
over land 


-0.124 (0.014) 
-0.145 (0.005) 


-0.104 (0.006) 
-0.043 (0.006) 


-0.035 (0.009) 
-0.008 (0.017) 


Corrected Extended Data Table 4 


Re interval [um] 


Re<10.5 
10.5<=Re<13.5 


-0.145 (0.012) 


-0.165 (0.010) 
13.5<=Re<16.5 -0.094 (0.012) 
16.5<=Re<19.5 -0.026 (0.016) 


19.5<=Re<22.5 
Re>=22.5 


+0.056 (0.022) 
+0.116 (0.030) 


AlnLWP/AInCDNC 
volcano and ship 
tracks over ocean 


AlnLWP/AInCDNC 
industry and fire tracks 
over land 


-0.129 (0.009 
-0.161 (0.005 


-0.122 (0.006 
-0.062 (0.006 


-0.058 (0.009 
-0.026 (0.011 


Fig. 5| This figure displays the corrected and the incorrect published 


Extended Data Table 4 of the original Article. 
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Corrections & amendments 


Author Correction: 
Molecular architecture of 
lineage allocation and 
tissue organization in 
early mouse embryo 


https://doi.org/10.1038/s41586-019-1887-7 


Correction to: Nature https://doi.org/10.1038/s41586-019-1469-8 


Published online 7 August 2019 


Guangdun Peng, Shengbao Suo, Guizhong Cui, Fang Yu, Ran Wang, 
Jun Chen, Shirui Chen, Zhiwen Liu, Guoyu Chen, Yun Qian, 
Patrick P. L. Tam, Jing-Dong J. Han & Naihe Jing 


In Extended Data Fig. 6a of this Letter, for consistency with the main 
figures, the labels ‘A’ and ‘P’ should be ‘Epil’ and ‘Epi2’, respectively, 
and labels ‘EA’ and ‘EP’ should be ‘En1’ and ‘En2’, respectively. In Sup- 
plementary Table 6, the content of the table was wrongly calculated 
owing to an error during coding. In Supplementary Table 7, the label 
‘Epil.E7.5’ should be ‘Ect1.E7.5’ for consistency with the main figures. 
These errors do not affect the conclusions of the Letter. All errors have 
been corrected online. 
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Corrections & amendments 


Publisher Correction: 
In vivo imaging of 
mitochondrial 
membrane potential in 
non-small-cell lung 
cancer 


https://doi.org/10.1038/s41586-019-1890-z 


Correction to: Nature https://doi.org/10.1038/s41586-019-1715-O 


Published online 30 October 2019 


Milica Momcilovic, Anthony Jones, Sean T. Bailey, 

Christopher M. Waldmann, Rui Li, Jason T. Lee, Gihad Abdelhady, 
Adrian Gomez, Travis Holloway, Ernst Schmid, David Stout, 
Michael C. Fishbein, Linsey Stiles, Deepa V. Dabir, 

Steven M. Dubinett, Heather Christofk, Orian Shirihai, 

Carla M. Koehler, Saman Sadeghi & David B. Shackelford 


Inthis Article, owing to an error during the production process, author 
Jason T. Lee was erroneously associated with affiliation 3 (The Mouse 
Phase! Unit, Lineberger School of Medicine at the University of North 
Carolina Chapel Hill, Chapel Hill, NC, USA), and should have instead 
been associated with affiliations 2, 4 and 5 (Department of Molecular 
and Medical Pharmacology, David Geffen School of Medicine at the 
University of California, Los Angeles, CA, USA; Crump Institute for 
Molecular Imaging, David Geffen School of Medicine at the University 
of California, Los Angeles, CA, USA; andJonsson Comprehensive Cancer 
Center, David Geffen School of Medicine at the University of California, 
Los Angeles, CA, USA). This has been corrected online. 
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The fate of whales provided a hook for helping policymakers to understand how science works in practice. 


POLICY TALES AND THE 
SECRET LIFE OF WHALES 


Palaeontologist finds a way to convey science to business 


leaders at the World Economic Forum. By Nick Pyenson 


s a palaeontologist who works with 
fossils of large, extinct ocean preda- 
tors, I tend to think that the story of 
our future has already been writtenin 
the geological past. The same rocks 
that preserve the remains of ancient whales 
tell us about dramatic sea-level rises that might 
be matched in our future, if global warming 
continues. As we begin to encounter geologi- 
cal-scale global changes in our own lifetimes, 
the past of this planet is a guide to what might 
happen. It’s hard for me to accept that scien- 
tists can explain how whale bones end up on 


mountain tops but we can’t find leadership to 
forestall glacial melting. 

Leadership was definitely on my mind when 
l attended the World Economic Forum (WEF) 
Annual Meeting of the New Champions 2019 
this past July in Dalian, China, to talk about 


“Many elected leaders 
pay little attentionto 
scientific evidence.” 


© 2020 Springer Nature Limited. All rights reserved. 


the secret life of whales to non-scientists 
from the business and policy fields. I thought 
the narrative of where whales originated, and 
how their fate today is inextricably linked with 
ours, would have traction at the WEF. I planned 
to use the fate of whales not just as a hook for 
amazing facts, but asa vehicle for understand- 
ing howscience works in practice. 1 was unsure 
about how my presentation would land; after 
all, many elected leaders pay little attentionto 
scientific evidence, often wilfully undermining 
it or happily ignoring it. 

I knew that the WEF was important: much 
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The Annual Meeting of the New Champions 2019, held by the World Economic Forum in Dalian, China. 


of its influence comes not from the named 
attendees, but from using its platform and 
network to affect change across many areas 
of governance. I had also joined the WEF’s 
Young Scientists community, which drew 
together a select group of early-career sci- 
entists from around the world for a two-year 
‘journey’ (now three years). | was reassured 
by the other young scientists, who shared my 
hope for science at the WEF; their presence in 
the audience gave me much-needed support. 
Fortunately, my talk went down well. 

So, what business do scientists have at 
a meeting such as the WEF? And what are 
the lessons for scientists who want to com- 
municate their relevance and the overall 
importance of science to global leaders? 


Your expertise matters. Scientists at 
the cutting edge of their fields have cred- 
ibility that is hard-won and long-last- 
ing. Use the opportunity granted by 
credibility to share information with peo- 
ple outside your normal scientific network. 


Scientific findings have value. They don’t 
necessarily show up in investor reports, but 
have ways of being durable and surprising. 
Scientists should speak about the value of 
scientific knowledge so that it isn’t opaque 
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or discounted as irrelevant. 

Stories of discovery are exciting. Whether 
it involves pandemics or neutrinos, don’t 
underestimate the thrill of discoveries. 
Scientists are experts at pursuing knowl- 
edge, and we should speak clearly about how 


“Scientists should speak 
about the value of 
scientific knowledge.” 


we work things out. Part of the excitement 
is not always knowing the answers to our 
questions, together with the unexpected 
challenges and insights along the way. Told 
correctly, these testimonies can inspire and 
motivate a range of audiences for along time. 


Facts need narrative. It’s clear that facts 
aren't always enough to capture interest 
or sway public opinion. Scientists can use 
the first-person narrative in unique ways. 
Combining subject expertise and storytelling 
savvy can give scientists influence in these 
multi-stakeholder meetings. The best 
presentations by scientists in Dalian did a 
lot more than merely translate jargon — the 
scientists used their subject knowledge and 


© 2020 Springer Nature Limited. All rights reserved. 


the power of narrative to captivate and con- 
nect with their audience. Giving entertaining, 
engaging talks requires knowing the facts, but 
also recognizing what details to omit. 

Scientists, of course, aren't great at 
everything. Although science hasa part to play 
innearly all of the 17 United Nations Sustaina- 
ble Development Goals, scientists alone would 
have a hard time writing them. For scientists 
who want to step up to the multi-stakeholder 
table, they need to understand the priorities 
of political and business leaders — after all, we 
can’t expect world leaders to become scientific 
experts in their spare time. 

The big decisions of our time, including 
how we respond to future sea-level rise, need 
to be made by people who understand the 
complexity of the world, and who possess both 
confidence with creative problem-solving and 
the patience needed to play the long game. 
Scientists have these traits in abundance, 
along with the credibility and competence 
to make a difference at the table of global 
leadership, which the world certainly needs. 


Nick Pyenson is a research geologist 

and curator of fossil marine mammals at 

the National Museum of Natural History, 
Smithsonian Institution, Washington DC, USA. 
He is the author of Spying on Whales (2018). 
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TECHNOLOGIES 10 
WATCHIN 2020 


Thought leaders describe the tech developments that could 


have a big impact in the coming year. By Esther Landhuis 


The virus SH1, reconstructed from images obtained using cryogenic electron microscopy. 


HONGWEI WANG 
BETTERCRYO-EM SAMPLES 


Intwo or three years, I think that transmission 
cryogenic electron microscopy (cryo-EM) will 
become the most powerful tool for decipher- 
ing the structures of macromolecules. These 
structures are crucial for understanding 
biochemical mechanisms and drug develop- 
ment, and methods for solving them more 
efficiently can speed up such work. 

In cryo-EM, quickly freezing biological 
specimens in liquid nitrogen helps to preserve 
the molecules’ water content and reduces 
damage from the high-energy electrons used 
for imaging. But specimen preparation is a 
major bottleneck: if you don’t have a good 
specimen, you have nothing to image. Biolog- 
ical specimens often contain proteins, which 
unravel at the surface of the thin liquid layers 
used in the freezing process. 

To prevent this unfolding, researchers 
are developing approaches that anchor 
proteins onto two-dimensional materials — 
suchas the carbon lattice graphene — before 


applying the liquid droplets. That way, they 
can make the droplets even smaller while 
keeping the protein away from the air-water 
interface’. 

Some laboratories place nanolitre-sized 
samples directly on to a surface’, instead of 
using cumbersome older methods that draw 
excess liquid away from larger droplets. Other 
methods usea focused ion beam toslice frozen 
cells into layers thinner than 100 nanometres, 
allowing researchers to study molecules in 
their cellular contexts’. 

Solving a molecular structure with cryo-EM 
typically requires collecting and analysing as 
many as 10,000 images, representing several 
weeks to a month of work. Many images are 
imperfect, so we have to discard them. But 
theoretically, a few dozen pictures should 
be enough, and it would take less than a day 
to collect and analyse them. This increased 
throughput could help us to understand 
disease mechanisms and develop drugs more 
efficiently. 


Hongwei Wang is a structural biologist at 
Tsinghua University in Beijing. 
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SARAH WOODSON 
IMPROVING RNA ANALYSIS 


I’m keeping my eye on long-read RNA 
sequencing and live-cell imaging using 
light-up RNA strands called aptamers. These 
technologies are still maturing, but I expect 
big changes in the next year or two. 

Short-read sequencing has changed the 
field of RNA biology — it can tell you which 
RNA sequences containa biochemically mod- 
ified residue, for example. However, longer 
reads (for instance, using sequencing tech- 
nologies offered by Oxford Nanopore and 
Pacific Biosciences) can now help to deter- 
mine howcommonaparticular modification 
isin the cell, and whether changes in one part 
of an RNA molecule correlate with changes 
in another. 

Light-up aptamers are single-stranded DNA 
or RNA molecules that were developed in the 
lab to bind to fluorescent dyes. They are RNA 
analogues of the green fluorescent protein 
that is produced in some marine animals, 
and when these aptamers bind to the dyes, 
their fluorescence intensity increases. This 
enables researchers to track, for example, the 
formation of intracellular RNA clusters that 
contribute to neurodegenerative diseases. 

Earlier light-up aptamers were unrelia- 
ble: their signals were dim, and sometimes 
the aptamers didn’t work at all because the 
sequences misfolded when fused with the tar- 
get RNA. But several groups have developed 
new types of fluorescent RNA, and in papers 
and talks I’ve seen a huge push toimprove the 
brightness of existing aptamers and create 
variants that glow in different colours. 

My lab has used chemical footprinting 
methods to study RNA folding in the cell. 
Many disorders are associated with changes 
inRNAstructure, but that has been really hard 
to tease apart. Now we are turning tolong-read 
sequencing and light-up aptamers to study 
RNA-protein aggregates in diseases including 
cancer, metabolic syndromes and Alzheimer’s. 
Using these technologies, we can better corre- 
late cell death and other disease features with 
what’s happening to RNA molecules inthe cell. 


Sarah Woodson is a biophysicist at Johns 
Hopkins University in Baltimore, Maryland. 
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ELHANAN ee SEL i 
CODING THE MIC F 


Over the past decade, methods for sequenc- 
ing the genetic content of microbial commu- 
nities have probed the composition of the 
human microbiome. More recently, scientists 
have tried to learn what the microbiome is 
doing by integrating information about 
genes, transcripts, proteins and metabo- 
lites. Metabolites are especially interesting: 
they could offer the closest understanding 
of how the microbiome affects our health, 
because many host-microbiome interactions 
occur through the metabolites that bacteria 
generate and consume. 

There has been an explosion of micro- 
biome-metabolome studies looking at, for 
instance, a set of stool samples — identifying 
the species present in each sample and their 
abundances through metagenomic sequenc- 
ing, and using mass spectrometry and other 


technologies to measure the concentrations 
of different metabolites. By combining these 
two profiles, the hope is to understand which 
member of the microbiome is doing what, and 
thus whether specific microbes determine the 
level of certain metabolites. 

But these data are complex and multi- 
dimensional, and there might be a whole web 
of interactions, involving multiple species 
and pathways, which ultimately produce a 
set of metabolites. Scientists have published 
computational methods to link microbiome 
and metabolome data and to learn these 
quirks and patterns. Such methods range 
from simple correlation-based analyses to 
complex machine-learning approaches that 
use existing microbiome-metabolome data 
sets to predict the metabolome in new micro- 
bial communities, or to recover microbe- 
metabolite relationships. 

Our lab takes a different strategy. Rather 
than apply statistical methods to find 
microbe-metabolite associations, we build 
mechanistic models of how we think a spe- 
cific microbial composition affects the 
metabolome, and use these as part of the 
analyses themselves. In effect, we are ask- 
ing: on the basis of genomic and metabolic 
information, what do we know about each 
microbe’s ability to produce or take up specific 
metabolites? We can then predict the potential 
of a given collection of microbes to produce 
or degrade specific metabolites, and compare 
those predictions with actual metabolomic 
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data. We showed that this approach avoids the 
pitfalls of simple correlation-based analyses‘, 
and will release a new version of the analysis 
framework in the coming months. 

Such studies could improve microbiome- 
based therapies by identifying, for example, 
specific microbes responsible for producing 
too much ofa harmful metabolite or too little 
of a beneficial one. 


Elhanan Borenstein is a computational 
systems biologist at Tel Aviv University, Israel. 


SAI PATEd CeRNeER 


When it comes to cancer, we cannot see the 
process by which the disease forms, only its 
end point: we sample a tumour when it has 
become clinically detectable. By then, the 
tumour has acquired many mutations, and 
we're left to work out what happened. 

Our team built a computational model to 
explore the dynamics of tumour progression 
while accounting for tissue spatial structure. 
With this model, you can simulate a range of 
scenarios and generate ‘virtual tumours’ with 
patterns of mutation that mimic patient data. 
By comparing simulated data with actual 
genomic data, it’s possible to infer which 
parameters probably gave rise to a patient’s 
tumour. 

I’m excited about complementing these 
inferential approaches with direct measure- 
ments of tumour lineage and phenotype using 
emerging barcoding and recording methods. 
Advances inthe past two years include evolving 
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CRISPR-based barcodes that can record the 
fate of cells during mammalian development*’. 
Other techniques use image-based detection 
of DNA barcodes through in situ expression of 
RNA, thereby capturing cellular lineage, spatial 
proximity and phenotypes’. 

In a study that modelled the growth of 
tumours in colon cancer®, we used tumour 
sequence data and simulations to study rela- 
tionships between primary and metastatic 
tumours. These inferential analyses indicated 
that the vast majority of cancers had spread 
when the primary tumour comprised barely 
100,000 cells — too small to detect using stand- 
ard diagnostic methods suchas colonoscopy. 

With better sensitivity and scalability, 
a blend of modelling and measurement 
methods could track both lineage and spatial 
relationships during tumour formation, giving 
insight into cancer’s origins, including how 
specific mutations influence cellular fitness 
and fuel the disease’s progression. 


Christina Curtis is a computational and 
systems biologist at Stanford University, 
California. 


ALEXNORD 


We’re now about 15 years into large-scale 
experiments to map enhancers and other reg- 
ulatory DNA sequences that control how genes 
are read out by cells and organs. Although more 
work is needed to complete these maps, we're 
at the point at which we can harness our under- 
standing to controlthe genome more precisely. 
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Software code can be used to build models that simulate tumour development. 


© 2020 Springer Nature Limited. All rights reserved. 


GETTY 


STEVE GSCHMEISSNER/SPL 


At the Society for Neuroscience annual 
meeting last October in Chicago, Illinois, | 
co-chaired a session that focused on identi- 
fying enhancer sequences and using them to 
control gene expression in specific cell types 
in the brain. One approach delivers engi- 
neered viruses into the brain to test thousands 
of enhancers for the gene-expression profile 
of interest. In 2019, researchers at the Allen 
Institute for Brain Science in Seattle, Washing- 
ton, used this strategy to look for enhancers 
in specific cortical layers in the human brain’. 
And ateam from Harvard University in Cam- 
bridge, Massachusetts, used an RNA-sequenc- 
ing-based method to find enhancers that act 
only in specific interneurons, a type of nerve 
cell that creates circuits”. 

Once enhancer sequences are identified, 
scientists can use them to drive expression 
in particular cell types for gene-therapy 
applications. In disorders caused by the 
inactivation or deletion of one copy of a 
gene, CRISPR-Cas9 gene-editing tools can 
target transcriptional activators to the 
gene’s enhancer to turn up expression of 
the working copy. Research in mice suggests 
these approaches can correct gene-expres- 
sion deficiencies that lead to obesity and 
to conditions such as fragile-xX, Rett and 
Dravet syndromes" — the latter a severe 
form of epilepsy that my lab is working on. 
Inthe coming year, I think we'll still be curing 
mice, but there is alot of industry investment 
in this technology. The hope is that we can 
use these methods to transform how gene 
therapy is done in humans. 


Alex Nord is a geneticist at the University of 
California, Davis. 


J. CHRISTOPHER LOVE 
SINGLE-CELL SEQUENCING 


I’m interested in how we bring medicines 
to patients faster and more accessibly. The 
technologies required are multifaceted. On 
the one hand, there’s discovery — for example, 
single-cell sequencing methods. On the other 
hand, there’s the matter of getting the technol- 
ogy to the patient — the manufacturing part. 
This is particularly relevant to medicines for 
rare diseases or for small populations, and is 
even applicable to global access to medicines 
we already have. 

On the discovery front, we’ve worked 
with colleagues at the Massachusetts Insti- 
tute of Technology (MIT) in Cambridge to 
develop a portable, inexpensive platform for 
high-throughput, single-cell RNA sequenc- 
ing”. Butit’s still challenging to get sufficient 
resolution to distinguish between immune- 
cell subtypes, for instance, with different roles 
and antigen specificities. Over the past year 


Activated T cells from human blood. 


or so, we’ve enhanced single-cell genomic 
sequencing in several ways. First, we came 
up with a method for detecting low-expres- 
sion transcripts more efficiently”. And for T 
lymphocytes specifically, we designed a pro- 
tocol that links each cell’s gene-expression 
profile with the sequence of its unique antigen 
receptor”. 

Meanwhile, a team at the Dana Farber 
Cancer Institute in Boston, Massachusetts, has 
published a clever library-screening strategy 
to address the other side of the equation — 
working out which antigen a particular T-cell 
receptor recognizes”. 

With MIT collaborator Alex Shalek and 
others, I have started a company, Honey- 
comb Biotechnologies, to commercialize our 
single-cell RNA-sequencing platform. Instead 
of having to spin down cells in a centrifuge, 
stick them in a tube, freeze it in liquid nitro- 
gen and ship it from Africa, say, you could 
just ship an array of single-cell-sized wells — 
something the size of a USB thumb drive. That 
could make single-cell storage and genomic 
profiling possible for just about any sample 
anywhere in the world. 


J. Christopher Love is a chemical engineer 
at the Koch Institute for Integrative Cancer 
Research at MIT in Cambridge, Massachusetts. 


JENNIFER PHILLIPS-CREMINS 
LINKING GENOME STRUCTURE 
AND FUNCTION 


When you stretch out a single cell’s DNA end 
to end, it’s roughly 2 metres long — yet it has 
to fit into a nucleus with a diameter smaller 
than the head of a pin. The folding patterns 
cannot be random; chromosomes form 3D 
structures that must be spatially and tempo- 
rally regulated across an organism’s lifespan. 

With genomics and imaging advances 
over the past decade, we can now create 
ultra-high-resolution maps of how the 
genome folds. Now the big question is, 
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what is the function of each of these folding 
patterns? How do they control fundamental 
processes such as gene expression, DNA 
replication and DNA repair? 

Several synthetic-biology approaches 
could allow us to fold and probe the genome 
across a range of length- and timescales. 
One method, CRISPR-GO, can carry pieces 
of DNA to specific compartments on or in 
the nucleus”. This will allow scientists to ask 
howthe nuclear placement of DNA sequences 
governs gene function. 

Another is our lab’s light-activated 
dynamic looping (LADL) tool, which uses 
light and CRISPR-Cas9 to tether specific 
pieces of DNA together on demand over long 
distances”. This can bring an enhancer into 
direct contact with a target gene thousands 
or even millions of bases away, so we can 
directly assess that regulatory sequence’s 
function: does expression of its target gene 
go up or down, and to what degree? The 
technology allows precise spatio-temporal 
control over gene expression, which is 
critically disrupted in many diseases. 

A third system, CasDrop, uses another 
light-activated CRISPR-Cas9 system to 
pull specific pieces of DNA into subnuclear 
membraneless ‘condensates”®. Their function 
incells has been hotly debated since they were 
discovered a few years ago. 

What inspires me for the future is that we 
can couple these 3D genome-engineering 
tools with CRISPR-based live-cell imaging 
approaches, so that we can both engineer 
and observe the genome in real time in cells. 

Function could drive structure. Or structure 
could drive function. This is a great mystery 
that these engineering tools will allow us to 
answer. 


Jennifer Phillips-Cremins is an epigeneticist 
and bioengineer at the University of 
Pennsylvania, Philadelphia. 
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e don’t know the exact number of 
dead insects in the entomology 
collection at the Natural History 
Museum in London, but it’s more 
than 34 million. 

Our collections, for me, area place of wonder. 
The specimens they contain are the biological 
heritage of the planet: splendid to look at and 
packed with genetic information about the past. 
Some have come to us from across the globe, 
and make me feel how small 1am, as part of our 
biosphere. 

Theinsect collection stretches back hundreds 
of years. For example, we have a robber fly 
caught in 1680 by the queen’s gardener at 
Hampton Court Palace, near London. 

Flies are my focus. Not only arethey amazingly 
diverse, but they’re cute. We've got stalk-eyed 
flies; flies that are less than a millimetre in size; 
and my favourites, Mallophora robber flies, 
which look like massive bumblebees and are 
highly venomous. | also have a soft spot for 
botflies, one species of which (Cephalopina 
titillator) matures in camels’ nostrils. 

The collection isn’t static; there’s so much 
research going on. We’re always updating 
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nomenclature, revising evolutionary family 
trees and describing new species. 

The museum lends specimens by post, and 
we host not just scientists, but visitors such as 
designers looking for inspiration. We’re also 
trying to digitize the entire collection so that 
anyone can access it. 

I’m collaborating with Mara Lawniczak at 
the Wellcome Sanger Institute in Hinxton, UK, 
on what we call Project Neandersquito. We're 
trying to recover genomes from mosquito 
samples collected over the past century. Inthe 
past, people would cut offlegs or destroy whole 
specimens — which fills a curator like me with 
terror. Instead, we are washing the specimens 
with chemical solutions to extract DNA. 

Genetic analysis will help us to distin- 
guish between old mosquito specimens that 
look similar, and to learn how populations have 
changed. For example, we hope to see when 
genes for insecticide resistance arose. 


Erica McAlister is a senior curator at the 
Natural History Museum in London, UK, and 
author of The Secret Life of Flies (Firefly Books, 
2017). Interview by Amber Dance. 


